Paper Group ANR 1498
Multiple Learning for Regression in big data. Bayes Optimal Early Stopping Policies for Black-Box Optimization. Newswire versus Social Media for Disaster Response and Recovery. Learning to Relate from Captions and Bounding Boxes. Countering Language Drift via Visual Grounding. Occluded Face Recognition Using Low-rank Regression with Generalized Gra …
Multiple Learning for Regression in big data
Title | Multiple Learning for Regression in big data |
Authors | Xiang Liu, Ziyang Tang, Huyunting Huang, Tonglin Zhang, Baijian Yang |
Abstract | Regression problems that have closed-form solutions are well understood and can be easily implemented when the dataset is small enough to be all loaded into the RAM. Challenges arise when data is too big to be stored in RAM to compute the closed form solutions. Many techniques were proposed to overcome or alleviate the memory barrier problem but the solutions are often local optimal. In addition, most approaches require accessing the raw data again when updating the models. Parallel computing clusters are also expected if multiple models need to be computed simultaneously. We propose multiple learning approaches that utilize an array of sufficient statistics (SS) to address this big data challenge. This memory oblivious approach breaks the memory barrier when computing regressions with closed-form solutions, including but not limited to linear regression, weighted linear regression, linear regression with Box-Cox transformation (Box-Cox regression) and ridge regression models. The computation and update of the SS array can be handled at per row level or per mini-batch level. And updating a model is as easy as matrix addition and subtraction. Furthermore, multiple SS arrays for different models can be easily computed simultaneously to obtain multiple models at one pass through the dataset. We implemented our approaches on Spark and evaluated over the simulated datasets. Results showed our approaches can achieve closed-form solutions of multiple models at the cost of half training time of the traditional methods for a single model. |
Tasks | |
Published | 2019-03-03 |
URL | https://arxiv.org/abs/1903.00843v2 |
https://arxiv.org/pdf/1903.00843v2.pdf | |
PWC | https://paperswithcode.com/paper/multiple-learning-for-regression-in-big-data |
Repo | |
Framework | |
Bayes Optimal Early Stopping Policies for Black-Box Optimization
Title | Bayes Optimal Early Stopping Policies for Black-Box Optimization |
Authors | Matthew Streeter |
Abstract | We derive an optimal policy for adaptively restarting a randomized algorithm, based on observed features of the run-so-far, so as to minimize the expected time required for the algorithm to successfully terminate. Given a suitable Bayesian prior, this result can be used to select the optimal black-box optimization algorithm from among a large family of algorithms that includes random search, Successive Halving, and Hyperband. On CIFAR-10 and ImageNet hyperparameter tuning problems, the proposed policies offer up to a factor of 13 improvement over random search in terms of expected time to reach a given target accuracy, and up to a factor of 3 improvement over a baseline adaptive policy that terminates a run whenever its accuracy is below-median. |
Tasks | |
Published | 2019-02-21 |
URL | http://arxiv.org/abs/1902.08285v1 |
http://arxiv.org/pdf/1902.08285v1.pdf | |
PWC | https://paperswithcode.com/paper/bayes-optimal-early-stopping-policies-for |
Repo | |
Framework | |
Newswire versus Social Media for Disaster Response and Recovery
Title | Newswire versus Social Media for Disaster Response and Recovery |
Authors | Rakesh Verma, Samaneh Karimi, Daniel Lee, Omprakash Gnawali, Azadeh Shakery |
Abstract | In a disaster situation, first responders need to quickly acquire situational awareness and prioritize response based on the need, resources available and impact. Can they do this based on digital media such as Twitter alone, or newswire alone, or some combination of the two? We examine this question in the context of the 2015 Nepal Earthquakes. Because newswire articles are longer, effective summaries can be helpful in saving time yet giving key content. We evaluate the effectiveness of several unsupervised summarization techniques in capturing key content. We propose a method to link tweets written by the public and newswire articles, so that we can compare their key characteristics: timeliness, whether tweets appear earlier than their corresponding news articles, and content. A novel idea is to view relevant tweets as a summary of the matching news article and evaluate these summaries. Whenever possible, we present both quantitative and qualitative evaluations. One of our main findings is that tweets and newswire articles provide complementary perspectives that form a holistic view of the disaster situation. |
Tasks | |
Published | 2019-06-25 |
URL | https://arxiv.org/abs/1906.10607v1 |
https://arxiv.org/pdf/1906.10607v1.pdf | |
PWC | https://paperswithcode.com/paper/newswire-versus-social-media-for-disaster |
Repo | |
Framework | |
Learning to Relate from Captions and Bounding Boxes
Title | Learning to Relate from Captions and Bounding Boxes |
Authors | Sarthak Garg, Joel Ruben Antony Moniz, Anshu Aviral, Priyatham Bollimpalli |
Abstract | In this work, we propose a novel approach that predicts the relationships between various entities in an image in a weakly supervised manner by relying on image captions and object bounding box annotations as the sole source of supervision. Our proposed approach uses a top-down attention mechanism to align entities in captions to objects in the image, and then leverage the syntactic structure of the captions to align the relations. We use these alignments to train a relation classification network, thereby obtaining both grounded captions and dense relationships. We demonstrate the effectiveness of our model on the Visual Genome dataset by achieving a recall@50 of 15% and recall@100 of 25% on the relationships present in the image. We also show that the model successfully predicts relations that are not present in the corresponding captions. |
Tasks | Image Captioning, Relation Classification |
Published | 2019-12-01 |
URL | https://arxiv.org/abs/1912.00311v1 |
https://arxiv.org/pdf/1912.00311v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-relate-from-captions-and-bounding-1 |
Repo | |
Framework | |
Countering Language Drift via Visual Grounding
Title | Countering Language Drift via Visual Grounding |
Authors | Jason Lee, Kyunghyun Cho, Douwe Kiela |
Abstract | Emergent multi-agent communication protocols are very different from natural language and not easily interpretable by humans. We find that agents that were initially pretrained to produce natural language can also experience detrimental language drift: when a non-linguistic reward is used in a goal-based task, e.g. some scalar success metric, the communication protocol may easily and radically diverge from natural language. We recast translation as a multi-agent communication game and examine auxiliary training constraints for their effectiveness in mitigating language drift. We show that a combination of syntactic (language model likelihood) and semantic (visual grounding) constraints gives the best communication performance, allowing pre-trained agents to retain English syntax while learning to accurately convey the intended meaning. |
Tasks | Language Modelling |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04499v1 |
https://arxiv.org/pdf/1909.04499v1.pdf | |
PWC | https://paperswithcode.com/paper/countering-language-drift-via-visual |
Repo | |
Framework | |
Occluded Face Recognition Using Low-rank Regression with Generalized Gradient Direction
Title | Occluded Face Recognition Using Low-rank Regression with Generalized Gradient Direction |
Authors | Cho-Ying Wu, Jian-Jiun Ding |
Abstract | In this paper, a very effective method to solve the contiguous face occlusion recognition problem is proposed. It utilizes the robust image gradient direction features together with a variety of mapping functions and adopts a hierarchical sparse and low-rank regression model. This model unites the sparse representation in dictionary learning and the low-rank representation on the error term that is usually messy in the gradient domain. We call it the “weak low-rankness” optimization problem, which can be efficiently solved by the framework of Alternating Direction Method of Multipliers (ADMM). The optimum of the error term has a similar weak low-rank structure as the reference error map and the recognition performance can be enhanced by leaps and bounds using weak low-rankness optimization. Extensive experiments are conducted on real-world disguise / occlusion data and synthesized contiguous occlusion data. These experiments show that the proposed gradient direction-based hierarchical adaptive sparse and low-rank (GD-HASLR) algorithm has the best performance compared to state-of-the-art methods, including popular convolutional neural network-based methods. |
Tasks | Dictionary Learning, Face Recognition |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.02429v1 |
https://arxiv.org/pdf/1906.02429v1.pdf | |
PWC | https://paperswithcode.com/paper/occluded-face-recognition-using-low-rank |
Repo | |
Framework | |
A Variational Approach to Weakly Supervised Document-Level Multi-Aspect Sentiment Classification
Title | A Variational Approach to Weakly Supervised Document-Level Multi-Aspect Sentiment Classification |
Authors | Ziqian Zeng, Wenxuan Zhou, Xin Liu, Yangqiu Song |
Abstract | In this paper, we propose a variational approach to weakly supervised document-level multi-aspect sentiment classification. Instead of using user-generated ratings or annotations provided by domain experts, we use target-opinion word pairs as “supervision.” These word pairs can be extracted by using dependency parsers and simple rules. Our objective is to predict an opinion word given a target word while our ultimate goal is to learn a sentiment polarity classifier to predict the sentiment polarity of each aspect given a document. By introducing a latent variable, i.e., the sentiment polarity, to the objective function, we can inject the sentiment polarity classifier to the objective via the variational lower bound. We can learn a sentiment polarity classifier by optimizing the lower bound. We show that our method can outperform weakly supervised baselines on TripAdvisor and BeerAdvocate datasets and can be comparable to the state-of-the-art supervised method with hundreds of labels per aspect. |
Tasks | Sentiment Analysis |
Published | 2019-04-10 |
URL | http://arxiv.org/abs/1904.05055v1 |
http://arxiv.org/pdf/1904.05055v1.pdf | |
PWC | https://paperswithcode.com/paper/a-variational-approach-to-weakly-supervised |
Repo | |
Framework | |
Group-wise classification approach to improve Android malicious apps detection accuracy
Title | Group-wise classification approach to improve Android malicious apps detection accuracy |
Authors | Ashu Sharma, Sanjay K. Sahay |
Abstract | In the fast-growing smart devices, Android is the most popular OS, and due to its attractive features, mobility, ease of use, these devices hold sensitive information such as personal data, browsing history, shopping history, financial details, etc. Therefore, any security gap in these devices means that the information stored or accessing the smart devices are at high risk of being breached by the malware. These malware are continuously growing and are also used for military espionage, disrupting the industry, power grids, etc. To detect these malware, traditional signature matching techniques are widely used. However, such strategies are not capable to detect the advanced Android malicious apps because malware developer uses several obfuscation techniques. Hence, researchers are continuously addressing the security issues in the Android based smart devices. Therefore, in this paper using Drebin benchmark malware dataset we experimentally demonstrate how to improve the detection accuracy by analyzing the apps after grouping the collected data based on the permissions and achieved 97.15% overall average accuracy. Our results outperform the accuracy obtained without grouping data (79.27%, 2017), Arp, et al. (94%, 2014), Annamalai et al. (84.29%, 2016), Bahman Rashidi et al. (82%, 2017)) and Ali Feizollah, et al. (95.5%, 2017). The analysis also shows that among the groups, Microphone group detection accuracy is least while Calendar group apps are detected with the highest accuracy, and with the highest accuracy, and for the best performance, one shall take 80-100 features. |
Tasks | |
Published | 2019-04-03 |
URL | http://arxiv.org/abs/1904.02122v1 |
http://arxiv.org/pdf/1904.02122v1.pdf | |
PWC | https://paperswithcode.com/paper/group-wise-classification-approach-to-improve |
Repo | |
Framework | |
Federated Learning for Healthcare Informatics
Title | Federated Learning for Healthcare Informatics |
Authors | Jie Xu, Fei Wang |
Abstract | Recent rapid development of medical informatization and the corresponding advances of automated data collection in clinical sciences generate large volume of healthcare data. Proper use of these big data is closely related to the perfection of the whole health system, and is of great significance to drug development, health management and public health services. However, in addition to the heterogeneous and highly dimensional data characteristics caused by a spectrum of complex data types ranging from free-text clinical notes to various medical images, the fragmented data sources and privacy concerns of healthcare data are also huge obstacles to multi-institutional healthcare informatics research. Federated learning, a mechanism of training a shared global model with a central server while keeping all the sensitive data in local institutions where the data belong, is a new attempt to connect the scattered healthcare data sources without ignoring the privacy of data. This survey focuses on reviewing the current progress on federated learning including, but not limited to, healthcare informatics. We summarize the general solutions to the statistical challenges, system challenges and privacy issues in federated learning research for reference. By doing the survey, we hope to provide a useful resource for health informatics and computational research on current progress of how to perform machine learning techniques on heterogeneous data scattered in a large volume of institutions while considering the privacy concerns on sharing data. |
Tasks | |
Published | 2019-11-13 |
URL | https://arxiv.org/abs/1911.06270v1 |
https://arxiv.org/pdf/1911.06270v1.pdf | |
PWC | https://paperswithcode.com/paper/federated-learning-for-healthcare-informatics |
Repo | |
Framework | |
On Structured Filtering-Clustering: Global Error Bound and Optimal First-Order Algorithms
Title | On Structured Filtering-Clustering: Global Error Bound and Optimal First-Order Algorithms |
Authors | Nhat Ho, Tianyi Lin, Michael I. Jordan |
Abstract | In recent years, the filtering-clustering problems have been a central topic in statistics and machine learning, especially the $\ell_1$-trend filtering and $\ell_2$-convex clustering problems. In practice, such structured problems are typically solved by first-order algorithms despite the extremely ill-conditioned structures of difference operator matrices. Inspired by the desire to analyze the convergence rates of these algorithms, we show that for a large class of filtering-clustering problems, a \textit{global error bound} condition is satisfied for the dual filtering-clustering problems when a certain regularization is chosen. Based on this result, we show that many first-order algorithms attain the \textit{optimal rate of convergence} in different settings. In particular, we establish a generalized dual gradient ascent (GDGA) algorithmic framework with several subroutines. In deterministic setting when the subroutine is accelerated gradient descent (AGD), the resulting algorithm attains the linear convergence. This linear convergence also holds for the finite-sum setting in which the subroutine is the Katyusha algorithm. We also demonstrate that the GDGA with stochastic gradient descent (SGD) subroutine attains the optimal rate of convergence up to the logarithmic factor, shedding the light to the possibility of solving the filtering-clustering problems efficiently in online setting. Experiments conducted on $\ell_1$-trend filtering problems illustrate the favorable performance of our algorithms over other competing algorithms. |
Tasks | |
Published | 2019-04-16 |
URL | https://arxiv.org/abs/1904.07462v2 |
https://arxiv.org/pdf/1904.07462v2.pdf | |
PWC | https://paperswithcode.com/paper/global-error-bounds-and-linear-convergence |
Repo | |
Framework | |
“Jam Me If You Can’': Defeating Jammer with Deep Dueling Neural Network Architecture and Ambient Backscattering Augmented Communications
Title | “Jam Me If You Can’': Defeating Jammer with Deep Dueling Neural Network Architecture and Ambient Backscattering Augmented Communications |
Authors | Nguyen Van Huynh, Diep N. Nguyen, Dinh Thai Hoang, Eryk Dutkiewicz |
Abstract | With conventional anti-jamming solutions like frequency hopping or spread spectrum, legitimate transceivers often tend to “escape” or “hide” themselves from jammers. These reactive anti-jamming approaches are constrained by the lack of timely knowledge of jamming attacks. Bringing together the latest advances in neural network architectures and ambient backscattering communications, this work allows wireless nodes to effectively “face” the jammer by first learning its jamming strategy, then adapting the rate or transmitting information right on the jamming signal. Specifically, to deal with unknown jamming attacks, existing work often relies on reinforcement learning algorithms, e.g., Q-learning. However, the Q-learning algorithm is notorious for its slow convergence to the optimal policy, especially when the system state and action spaces are large. This makes the Q-learning algorithm pragmatically inapplicable. To overcome this problem, we design a novel deep reinforcement learning algorithm using the recent dueling neural network architecture. Our proposed algorithm allows the transmitter to effectively learn about the jammer and attain the optimal countermeasures thousand times faster than that of the conventional Q-learning algorithm. Through extensive simulation results, we show that our design (using ambient backscattering and the deep dueling neural network architecture) can improve the average throughput by up to 426% and reduce the packet loss by 24%. By augmenting the ambient backscattering capability on devices and using our algorithm, it is interesting to observe that the (successful) transmission rate increases with the jamming power. Our proposed solution can find its applications in both civil (e.g., ultra-reliable and low-latency communications or URLLC) and military scenarios (to combat both inadvertent and deliberate jamming). |
Tasks | Q-Learning |
Published | 2019-04-08 |
URL | http://arxiv.org/abs/1904.03897v1 |
http://arxiv.org/pdf/1904.03897v1.pdf | |
PWC | https://paperswithcode.com/paper/jam-me-if-you-can-defeating-jammer-with-deep |
Repo | |
Framework | |
Low-latency job scheduling with preemption for the development of deep learning
Title | Low-latency job scheduling with preemption for the development of deep learning |
Authors | Hidehito Yabuuchi, Daisuke Taniwaki, Shingo Omura |
Abstract | One significant challenge in the job scheduling of computing clusters for the development of deep learning algorithms is the efficient scheduling of trial-and-error (TE) job, the type of job in which the users seek to conduct small-scale experiments while monitoring their processes. Unfortunately, the existing job schedulers to date do not feature well-balanced scheduling for the mixture of TE jobs and best-effort (BE) jobs, or they can handle the mixture in limited situations at most. To fill in this niche, we propose an algorithm that can significantly reduce the latency of TE jobs in versatile situations without greatly elongating the slowdown of the BE jobs. Our algorithm efficiently schedules both TE and BE jobs by selectively preempting the BE jobs that can be, when the time comes, resumed without much delay. In our simulation study with synthetic and real workloads, we were able to reduce the 95th percentile of the slowdown rates for the TE jobs in the standard FIFO strategy by 96.6%, while compromising the median of the BE slowdown rates by only 18.0% and the 95th percentile by only 23.9%. |
Tasks | |
Published | 2019-02-05 |
URL | http://arxiv.org/abs/1902.01613v1 |
http://arxiv.org/pdf/1902.01613v1.pdf | |
PWC | https://paperswithcode.com/paper/low-latency-job-scheduling-with-preemption |
Repo | |
Framework | |
A Comparison of Word-based and Context-based Representations for Classification Problems in Health Informatics
Title | A Comparison of Word-based and Context-based Representations for Classification Problems in Health Informatics |
Authors | Aditya Joshi, Sarvnaz Karimi, Ross Sparks, Cecile Paris, C Raina MacIntyre |
Abstract | Distributed representations of text can be used as features when training a statistical classifier. These representations may be created as a composition of word vectors or as context-based sentence vectors. We compare the two kinds of representations (word versus context) for three classification problems: influenza infection classification, drug usage classification and personal health mention classification. For statistical classifiers trained for each of these problems, context-based representations based on ELMo, Universal Sentence Encoder, Neural-Net Language Model and FLAIR are better than Word2Vec, GloVe and the two adapted using the MESH ontology. There is an improvement of 2-4% in the accuracy when these context-based representations are used instead of word-based representations. |
Tasks | Language Modelling |
Published | 2019-06-13 |
URL | https://arxiv.org/abs/1906.05468v1 |
https://arxiv.org/pdf/1906.05468v1.pdf | |
PWC | https://paperswithcode.com/paper/a-comparison-of-word-based-and-context-based |
Repo | |
Framework | |
Disparity-preserved Deep Cross-platform Association for Cross-platform Video Recommendation
Title | Disparity-preserved Deep Cross-platform Association for Cross-platform Video Recommendation |
Authors | Shengze Yu, Xin Wang, Wenwu Zhu, Peng Cui, Jingdong Wang |
Abstract | Cross-platform recommendation aims to improve recommendation accuracy through associating information from different platforms. Existing cross-platform recommendation approaches assume all cross-platform information to be consistent with each other and can be aligned. However, there remain two unsolved challenges: i) there exist inconsistencies in cross-platform association due to platform-specific disparity, and ii) data from distinct platforms may have different semantic granularities. In this paper, we propose a cross-platform association model for cross-platform video recommendation, i.e., Disparity-preserved Deep Cross-platform Association (DCA), taking platform-specific disparity and granularity difference into consideration. The proposed DCA model employs a partially-connected multi-modal autoencoder, which is capable of explicitly capturing platform-specific information, as well as utilizing nonlinear mapping functions to handle granularity differences. We then present a cross-platform video recommendation approach based on the proposed DCA model. Extensive experiments for our cross-platform recommendation framework on real-world dataset demonstrate that the proposed DCA model significantly outperform existing cross-platform recommendation methods in terms of various evaluation metrics. |
Tasks | |
Published | 2019-01-01 |
URL | https://arxiv.org/abs/1901.00171v2 |
https://arxiv.org/pdf/1901.00171v2.pdf | |
PWC | https://paperswithcode.com/paper/disparity-preserved-deep-cross-platform |
Repo | |
Framework | |
Bayesian experimental design using regularized determinantal point processes
Title | Bayesian experimental design using regularized determinantal point processes |
Authors | Michał Dereziński, Feynman Liang, Michael W. Mahoney |
Abstract | In experimental design, we are given $n$ vectors in $d$ dimensions, and our goal is to select $k\ll n$ of them to perform expensive measurements, e.g., to obtain labels/responses, for a linear regression task. Many statistical criteria have been proposed for choosing the optimal design, with popular choices including A- and D-optimality. If prior knowledge is given, typically in the form of a $d\times d$ precision matrix $\mathbf A$, then all of the criteria can be extended to incorporate that information via a Bayesian framework. In this paper, we demonstrate a new fundamental connection between Bayesian experimental design and determinantal point processes, the latter being widely used for sampling diverse subsets of data. We use this connection to develop new efficient algorithms for finding $(1+\epsilon)$-approximations of optimal designs under four optimality criteria: A, C, D and V. Our algorithms can achieve this when the desired subset size $k$ is $\Omega(\frac{d_{\mathbf A}}{\epsilon} + \frac{\log 1/\epsilon}{\epsilon^2})$, where $d_{\mathbf A}\leq d$ is the $\mathbf A$-effective dimension, which can often be much smaller than $d$. Our results offer direct improvements over a number of prior works, for both Bayesian and classical experimental design, in terms of algorithm efficiency, approximation quality, and range of applicable criteria. |
Tasks | Point Processes |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.04133v1 |
https://arxiv.org/pdf/1906.04133v1.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-experimental-design-using |
Repo | |
Framework | |