January 26, 2020

3340 words 16 mins read

Paper Group ANR 1498

Paper Group ANR 1498

Multiple Learning for Regression in big data. Bayes Optimal Early Stopping Policies for Black-Box Optimization. Newswire versus Social Media for Disaster Response and Recovery. Learning to Relate from Captions and Bounding Boxes. Countering Language Drift via Visual Grounding. Occluded Face Recognition Using Low-rank Regression with Generalized Gra …

Multiple Learning for Regression in big data

Title Multiple Learning for Regression in big data
Authors Xiang Liu, Ziyang Tang, Huyunting Huang, Tonglin Zhang, Baijian Yang
Abstract Regression problems that have closed-form solutions are well understood and can be easily implemented when the dataset is small enough to be all loaded into the RAM. Challenges arise when data is too big to be stored in RAM to compute the closed form solutions. Many techniques were proposed to overcome or alleviate the memory barrier problem but the solutions are often local optimal. In addition, most approaches require accessing the raw data again when updating the models. Parallel computing clusters are also expected if multiple models need to be computed simultaneously. We propose multiple learning approaches that utilize an array of sufficient statistics (SS) to address this big data challenge. This memory oblivious approach breaks the memory barrier when computing regressions with closed-form solutions, including but not limited to linear regression, weighted linear regression, linear regression with Box-Cox transformation (Box-Cox regression) and ridge regression models. The computation and update of the SS array can be handled at per row level or per mini-batch level. And updating a model is as easy as matrix addition and subtraction. Furthermore, multiple SS arrays for different models can be easily computed simultaneously to obtain multiple models at one pass through the dataset. We implemented our approaches on Spark and evaluated over the simulated datasets. Results showed our approaches can achieve closed-form solutions of multiple models at the cost of half training time of the traditional methods for a single model.
Tasks
Published 2019-03-03
URL https://arxiv.org/abs/1903.00843v2
PDF https://arxiv.org/pdf/1903.00843v2.pdf
PWC https://paperswithcode.com/paper/multiple-learning-for-regression-in-big-data
Repo
Framework

Bayes Optimal Early Stopping Policies for Black-Box Optimization

Title Bayes Optimal Early Stopping Policies for Black-Box Optimization
Authors Matthew Streeter
Abstract We derive an optimal policy for adaptively restarting a randomized algorithm, based on observed features of the run-so-far, so as to minimize the expected time required for the algorithm to successfully terminate. Given a suitable Bayesian prior, this result can be used to select the optimal black-box optimization algorithm from among a large family of algorithms that includes random search, Successive Halving, and Hyperband. On CIFAR-10 and ImageNet hyperparameter tuning problems, the proposed policies offer up to a factor of 13 improvement over random search in terms of expected time to reach a given target accuracy, and up to a factor of 3 improvement over a baseline adaptive policy that terminates a run whenever its accuracy is below-median.
Tasks
Published 2019-02-21
URL http://arxiv.org/abs/1902.08285v1
PDF http://arxiv.org/pdf/1902.08285v1.pdf
PWC https://paperswithcode.com/paper/bayes-optimal-early-stopping-policies-for
Repo
Framework

Newswire versus Social Media for Disaster Response and Recovery

Title Newswire versus Social Media for Disaster Response and Recovery
Authors Rakesh Verma, Samaneh Karimi, Daniel Lee, Omprakash Gnawali, Azadeh Shakery
Abstract In a disaster situation, first responders need to quickly acquire situational awareness and prioritize response based on the need, resources available and impact. Can they do this based on digital media such as Twitter alone, or newswire alone, or some combination of the two? We examine this question in the context of the 2015 Nepal Earthquakes. Because newswire articles are longer, effective summaries can be helpful in saving time yet giving key content. We evaluate the effectiveness of several unsupervised summarization techniques in capturing key content. We propose a method to link tweets written by the public and newswire articles, so that we can compare their key characteristics: timeliness, whether tweets appear earlier than their corresponding news articles, and content. A novel idea is to view relevant tweets as a summary of the matching news article and evaluate these summaries. Whenever possible, we present both quantitative and qualitative evaluations. One of our main findings is that tweets and newswire articles provide complementary perspectives that form a holistic view of the disaster situation.
Tasks
Published 2019-06-25
URL https://arxiv.org/abs/1906.10607v1
PDF https://arxiv.org/pdf/1906.10607v1.pdf
PWC https://paperswithcode.com/paper/newswire-versus-social-media-for-disaster
Repo
Framework

Learning to Relate from Captions and Bounding Boxes

Title Learning to Relate from Captions and Bounding Boxes
Authors Sarthak Garg, Joel Ruben Antony Moniz, Anshu Aviral, Priyatham Bollimpalli
Abstract In this work, we propose a novel approach that predicts the relationships between various entities in an image in a weakly supervised manner by relying on image captions and object bounding box annotations as the sole source of supervision. Our proposed approach uses a top-down attention mechanism to align entities in captions to objects in the image, and then leverage the syntactic structure of the captions to align the relations. We use these alignments to train a relation classification network, thereby obtaining both grounded captions and dense relationships. We demonstrate the effectiveness of our model on the Visual Genome dataset by achieving a recall@50 of 15% and recall@100 of 25% on the relationships present in the image. We also show that the model successfully predicts relations that are not present in the corresponding captions.
Tasks Image Captioning, Relation Classification
Published 2019-12-01
URL https://arxiv.org/abs/1912.00311v1
PDF https://arxiv.org/pdf/1912.00311v1.pdf
PWC https://paperswithcode.com/paper/learning-to-relate-from-captions-and-bounding-1
Repo
Framework

Countering Language Drift via Visual Grounding

Title Countering Language Drift via Visual Grounding
Authors Jason Lee, Kyunghyun Cho, Douwe Kiela
Abstract Emergent multi-agent communication protocols are very different from natural language and not easily interpretable by humans. We find that agents that were initially pretrained to produce natural language can also experience detrimental language drift: when a non-linguistic reward is used in a goal-based task, e.g. some scalar success metric, the communication protocol may easily and radically diverge from natural language. We recast translation as a multi-agent communication game and examine auxiliary training constraints for their effectiveness in mitigating language drift. We show that a combination of syntactic (language model likelihood) and semantic (visual grounding) constraints gives the best communication performance, allowing pre-trained agents to retain English syntax while learning to accurately convey the intended meaning.
Tasks Language Modelling
Published 2019-09-10
URL https://arxiv.org/abs/1909.04499v1
PDF https://arxiv.org/pdf/1909.04499v1.pdf
PWC https://paperswithcode.com/paper/countering-language-drift-via-visual
Repo
Framework

Occluded Face Recognition Using Low-rank Regression with Generalized Gradient Direction

Title Occluded Face Recognition Using Low-rank Regression with Generalized Gradient Direction
Authors Cho-Ying Wu, Jian-Jiun Ding
Abstract In this paper, a very effective method to solve the contiguous face occlusion recognition problem is proposed. It utilizes the robust image gradient direction features together with a variety of mapping functions and adopts a hierarchical sparse and low-rank regression model. This model unites the sparse representation in dictionary learning and the low-rank representation on the error term that is usually messy in the gradient domain. We call it the “weak low-rankness” optimization problem, which can be efficiently solved by the framework of Alternating Direction Method of Multipliers (ADMM). The optimum of the error term has a similar weak low-rank structure as the reference error map and the recognition performance can be enhanced by leaps and bounds using weak low-rankness optimization. Extensive experiments are conducted on real-world disguise / occlusion data and synthesized contiguous occlusion data. These experiments show that the proposed gradient direction-based hierarchical adaptive sparse and low-rank (GD-HASLR) algorithm has the best performance compared to state-of-the-art methods, including popular convolutional neural network-based methods.
Tasks Dictionary Learning, Face Recognition
Published 2019-06-06
URL https://arxiv.org/abs/1906.02429v1
PDF https://arxiv.org/pdf/1906.02429v1.pdf
PWC https://paperswithcode.com/paper/occluded-face-recognition-using-low-rank
Repo
Framework

A Variational Approach to Weakly Supervised Document-Level Multi-Aspect Sentiment Classification

Title A Variational Approach to Weakly Supervised Document-Level Multi-Aspect Sentiment Classification
Authors Ziqian Zeng, Wenxuan Zhou, Xin Liu, Yangqiu Song
Abstract In this paper, we propose a variational approach to weakly supervised document-level multi-aspect sentiment classification. Instead of using user-generated ratings or annotations provided by domain experts, we use target-opinion word pairs as “supervision.” These word pairs can be extracted by using dependency parsers and simple rules. Our objective is to predict an opinion word given a target word while our ultimate goal is to learn a sentiment polarity classifier to predict the sentiment polarity of each aspect given a document. By introducing a latent variable, i.e., the sentiment polarity, to the objective function, we can inject the sentiment polarity classifier to the objective via the variational lower bound. We can learn a sentiment polarity classifier by optimizing the lower bound. We show that our method can outperform weakly supervised baselines on TripAdvisor and BeerAdvocate datasets and can be comparable to the state-of-the-art supervised method with hundreds of labels per aspect.
Tasks Sentiment Analysis
Published 2019-04-10
URL http://arxiv.org/abs/1904.05055v1
PDF http://arxiv.org/pdf/1904.05055v1.pdf
PWC https://paperswithcode.com/paper/a-variational-approach-to-weakly-supervised
Repo
Framework

Group-wise classification approach to improve Android malicious apps detection accuracy

Title Group-wise classification approach to improve Android malicious apps detection accuracy
Authors Ashu Sharma, Sanjay K. Sahay
Abstract In the fast-growing smart devices, Android is the most popular OS, and due to its attractive features, mobility, ease of use, these devices hold sensitive information such as personal data, browsing history, shopping history, financial details, etc. Therefore, any security gap in these devices means that the information stored or accessing the smart devices are at high risk of being breached by the malware. These malware are continuously growing and are also used for military espionage, disrupting the industry, power grids, etc. To detect these malware, traditional signature matching techniques are widely used. However, such strategies are not capable to detect the advanced Android malicious apps because malware developer uses several obfuscation techniques. Hence, researchers are continuously addressing the security issues in the Android based smart devices. Therefore, in this paper using Drebin benchmark malware dataset we experimentally demonstrate how to improve the detection accuracy by analyzing the apps after grouping the collected data based on the permissions and achieved 97.15% overall average accuracy. Our results outperform the accuracy obtained without grouping data (79.27%, 2017), Arp, et al. (94%, 2014), Annamalai et al. (84.29%, 2016), Bahman Rashidi et al. (82%, 2017)) and Ali Feizollah, et al. (95.5%, 2017). The analysis also shows that among the groups, Microphone group detection accuracy is least while Calendar group apps are detected with the highest accuracy, and with the highest accuracy, and for the best performance, one shall take 80-100 features.
Tasks
Published 2019-04-03
URL http://arxiv.org/abs/1904.02122v1
PDF http://arxiv.org/pdf/1904.02122v1.pdf
PWC https://paperswithcode.com/paper/group-wise-classification-approach-to-improve
Repo
Framework

Federated Learning for Healthcare Informatics

Title Federated Learning for Healthcare Informatics
Authors Jie Xu, Fei Wang
Abstract Recent rapid development of medical informatization and the corresponding advances of automated data collection in clinical sciences generate large volume of healthcare data. Proper use of these big data is closely related to the perfection of the whole health system, and is of great significance to drug development, health management and public health services. However, in addition to the heterogeneous and highly dimensional data characteristics caused by a spectrum of complex data types ranging from free-text clinical notes to various medical images, the fragmented data sources and privacy concerns of healthcare data are also huge obstacles to multi-institutional healthcare informatics research. Federated learning, a mechanism of training a shared global model with a central server while keeping all the sensitive data in local institutions where the data belong, is a new attempt to connect the scattered healthcare data sources without ignoring the privacy of data. This survey focuses on reviewing the current progress on federated learning including, but not limited to, healthcare informatics. We summarize the general solutions to the statistical challenges, system challenges and privacy issues in federated learning research for reference. By doing the survey, we hope to provide a useful resource for health informatics and computational research on current progress of how to perform machine learning techniques on heterogeneous data scattered in a large volume of institutions while considering the privacy concerns on sharing data.
Tasks
Published 2019-11-13
URL https://arxiv.org/abs/1911.06270v1
PDF https://arxiv.org/pdf/1911.06270v1.pdf
PWC https://paperswithcode.com/paper/federated-learning-for-healthcare-informatics
Repo
Framework

On Structured Filtering-Clustering: Global Error Bound and Optimal First-Order Algorithms

Title On Structured Filtering-Clustering: Global Error Bound and Optimal First-Order Algorithms
Authors Nhat Ho, Tianyi Lin, Michael I. Jordan
Abstract In recent years, the filtering-clustering problems have been a central topic in statistics and machine learning, especially the $\ell_1$-trend filtering and $\ell_2$-convex clustering problems. In practice, such structured problems are typically solved by first-order algorithms despite the extremely ill-conditioned structures of difference operator matrices. Inspired by the desire to analyze the convergence rates of these algorithms, we show that for a large class of filtering-clustering problems, a \textit{global error bound} condition is satisfied for the dual filtering-clustering problems when a certain regularization is chosen. Based on this result, we show that many first-order algorithms attain the \textit{optimal rate of convergence} in different settings. In particular, we establish a generalized dual gradient ascent (GDGA) algorithmic framework with several subroutines. In deterministic setting when the subroutine is accelerated gradient descent (AGD), the resulting algorithm attains the linear convergence. This linear convergence also holds for the finite-sum setting in which the subroutine is the Katyusha algorithm. We also demonstrate that the GDGA with stochastic gradient descent (SGD) subroutine attains the optimal rate of convergence up to the logarithmic factor, shedding the light to the possibility of solving the filtering-clustering problems efficiently in online setting. Experiments conducted on $\ell_1$-trend filtering problems illustrate the favorable performance of our algorithms over other competing algorithms.
Tasks
Published 2019-04-16
URL https://arxiv.org/abs/1904.07462v2
PDF https://arxiv.org/pdf/1904.07462v2.pdf
PWC https://paperswithcode.com/paper/global-error-bounds-and-linear-convergence
Repo
Framework

“Jam Me If You Can’': Defeating Jammer with Deep Dueling Neural Network Architecture and Ambient Backscattering Augmented Communications

Title “Jam Me If You Can’': Defeating Jammer with Deep Dueling Neural Network Architecture and Ambient Backscattering Augmented Communications
Authors Nguyen Van Huynh, Diep N. Nguyen, Dinh Thai Hoang, Eryk Dutkiewicz
Abstract With conventional anti-jamming solutions like frequency hopping or spread spectrum, legitimate transceivers often tend to “escape” or “hide” themselves from jammers. These reactive anti-jamming approaches are constrained by the lack of timely knowledge of jamming attacks. Bringing together the latest advances in neural network architectures and ambient backscattering communications, this work allows wireless nodes to effectively “face” the jammer by first learning its jamming strategy, then adapting the rate or transmitting information right on the jamming signal. Specifically, to deal with unknown jamming attacks, existing work often relies on reinforcement learning algorithms, e.g., Q-learning. However, the Q-learning algorithm is notorious for its slow convergence to the optimal policy, especially when the system state and action spaces are large. This makes the Q-learning algorithm pragmatically inapplicable. To overcome this problem, we design a novel deep reinforcement learning algorithm using the recent dueling neural network architecture. Our proposed algorithm allows the transmitter to effectively learn about the jammer and attain the optimal countermeasures thousand times faster than that of the conventional Q-learning algorithm. Through extensive simulation results, we show that our design (using ambient backscattering and the deep dueling neural network architecture) can improve the average throughput by up to 426% and reduce the packet loss by 24%. By augmenting the ambient backscattering capability on devices and using our algorithm, it is interesting to observe that the (successful) transmission rate increases with the jamming power. Our proposed solution can find its applications in both civil (e.g., ultra-reliable and low-latency communications or URLLC) and military scenarios (to combat both inadvertent and deliberate jamming).
Tasks Q-Learning
Published 2019-04-08
URL http://arxiv.org/abs/1904.03897v1
PDF http://arxiv.org/pdf/1904.03897v1.pdf
PWC https://paperswithcode.com/paper/jam-me-if-you-can-defeating-jammer-with-deep
Repo
Framework

Low-latency job scheduling with preemption for the development of deep learning

Title Low-latency job scheduling with preemption for the development of deep learning
Authors Hidehito Yabuuchi, Daisuke Taniwaki, Shingo Omura
Abstract One significant challenge in the job scheduling of computing clusters for the development of deep learning algorithms is the efficient scheduling of trial-and-error (TE) job, the type of job in which the users seek to conduct small-scale experiments while monitoring their processes. Unfortunately, the existing job schedulers to date do not feature well-balanced scheduling for the mixture of TE jobs and best-effort (BE) jobs, or they can handle the mixture in limited situations at most. To fill in this niche, we propose an algorithm that can significantly reduce the latency of TE jobs in versatile situations without greatly elongating the slowdown of the BE jobs. Our algorithm efficiently schedules both TE and BE jobs by selectively preempting the BE jobs that can be, when the time comes, resumed without much delay. In our simulation study with synthetic and real workloads, we were able to reduce the 95th percentile of the slowdown rates for the TE jobs in the standard FIFO strategy by 96.6%, while compromising the median of the BE slowdown rates by only 18.0% and the 95th percentile by only 23.9%.
Tasks
Published 2019-02-05
URL http://arxiv.org/abs/1902.01613v1
PDF http://arxiv.org/pdf/1902.01613v1.pdf
PWC https://paperswithcode.com/paper/low-latency-job-scheduling-with-preemption
Repo
Framework

A Comparison of Word-based and Context-based Representations for Classification Problems in Health Informatics

Title A Comparison of Word-based and Context-based Representations for Classification Problems in Health Informatics
Authors Aditya Joshi, Sarvnaz Karimi, Ross Sparks, Cecile Paris, C Raina MacIntyre
Abstract Distributed representations of text can be used as features when training a statistical classifier. These representations may be created as a composition of word vectors or as context-based sentence vectors. We compare the two kinds of representations (word versus context) for three classification problems: influenza infection classification, drug usage classification and personal health mention classification. For statistical classifiers trained for each of these problems, context-based representations based on ELMo, Universal Sentence Encoder, Neural-Net Language Model and FLAIR are better than Word2Vec, GloVe and the two adapted using the MESH ontology. There is an improvement of 2-4% in the accuracy when these context-based representations are used instead of word-based representations.
Tasks Language Modelling
Published 2019-06-13
URL https://arxiv.org/abs/1906.05468v1
PDF https://arxiv.org/pdf/1906.05468v1.pdf
PWC https://paperswithcode.com/paper/a-comparison-of-word-based-and-context-based
Repo
Framework

Disparity-preserved Deep Cross-platform Association for Cross-platform Video Recommendation

Title Disparity-preserved Deep Cross-platform Association for Cross-platform Video Recommendation
Authors Shengze Yu, Xin Wang, Wenwu Zhu, Peng Cui, Jingdong Wang
Abstract Cross-platform recommendation aims to improve recommendation accuracy through associating information from different platforms. Existing cross-platform recommendation approaches assume all cross-platform information to be consistent with each other and can be aligned. However, there remain two unsolved challenges: i) there exist inconsistencies in cross-platform association due to platform-specific disparity, and ii) data from distinct platforms may have different semantic granularities. In this paper, we propose a cross-platform association model for cross-platform video recommendation, i.e., Disparity-preserved Deep Cross-platform Association (DCA), taking platform-specific disparity and granularity difference into consideration. The proposed DCA model employs a partially-connected multi-modal autoencoder, which is capable of explicitly capturing platform-specific information, as well as utilizing nonlinear mapping functions to handle granularity differences. We then present a cross-platform video recommendation approach based on the proposed DCA model. Extensive experiments for our cross-platform recommendation framework on real-world dataset demonstrate that the proposed DCA model significantly outperform existing cross-platform recommendation methods in terms of various evaluation metrics.
Tasks
Published 2019-01-01
URL https://arxiv.org/abs/1901.00171v2
PDF https://arxiv.org/pdf/1901.00171v2.pdf
PWC https://paperswithcode.com/paper/disparity-preserved-deep-cross-platform
Repo
Framework

Bayesian experimental design using regularized determinantal point processes

Title Bayesian experimental design using regularized determinantal point processes
Authors Michał Dereziński, Feynman Liang, Michael W. Mahoney
Abstract In experimental design, we are given $n$ vectors in $d$ dimensions, and our goal is to select $k\ll n$ of them to perform expensive measurements, e.g., to obtain labels/responses, for a linear regression task. Many statistical criteria have been proposed for choosing the optimal design, with popular choices including A- and D-optimality. If prior knowledge is given, typically in the form of a $d\times d$ precision matrix $\mathbf A$, then all of the criteria can be extended to incorporate that information via a Bayesian framework. In this paper, we demonstrate a new fundamental connection between Bayesian experimental design and determinantal point processes, the latter being widely used for sampling diverse subsets of data. We use this connection to develop new efficient algorithms for finding $(1+\epsilon)$-approximations of optimal designs under four optimality criteria: A, C, D and V. Our algorithms can achieve this when the desired subset size $k$ is $\Omega(\frac{d_{\mathbf A}}{\epsilon} + \frac{\log 1/\epsilon}{\epsilon^2})$, where $d_{\mathbf A}\leq d$ is the $\mathbf A$-effective dimension, which can often be much smaller than $d$. Our results offer direct improvements over a number of prior works, for both Bayesian and classical experimental design, in terms of algorithm efficiency, approximation quality, and range of applicable criteria.
Tasks Point Processes
Published 2019-06-10
URL https://arxiv.org/abs/1906.04133v1
PDF https://arxiv.org/pdf/1906.04133v1.pdf
PWC https://paperswithcode.com/paper/bayesian-experimental-design-using
Repo
Framework
comments powered by Disqus