Paper Group ANR 1049
500+ Times Faster Than Deep Learning (A Case Study Exploring Faster Methods for Text Mining StackOverflow). A Multi-task Neural Approach for Emotion Attribution, Classification and Summarization. Bayesian Optimal Design of Experiments For Inferring The Statistical Expectation Of A Black-Box Function. Age and Gender Classification From Ear Images. S …
500+ Times Faster Than Deep Learning (A Case Study Exploring Faster Methods for Text Mining StackOverflow)
Title | 500+ Times Faster Than Deep Learning (A Case Study Exploring Faster Methods for Text Mining StackOverflow) |
Authors | Suvodeep Majumder, Nikhila Balaji, Katie Brey, Wei Fu, Tim Menzies |
Abstract | Deep learning methods are useful for high-dimensional data and are becoming widely used in many areas of software engineering. Deep learners utilizes extensive computational power and can take a long time to train– making it difficult to widely validate and repeat and improve their results. Further, they are not the best solution in all domains. For example, recent results show that for finding related Stack Overflow posts, a tuned SVM performs similarly to a deep learner, but is significantly faster to train. This paper extends that recent result by clustering the dataset, then tuning very learners within each cluster. This approach is over 500 times faster than deep learning (and over 900 times faster if we use all the cores on a standard laptop computer). Significantly, this faster approach generates classifiers nearly as good (within 2% F1 Score) as the much slower deep learning method. Hence we recommend this faster methods since it is much easier to reproduce and utilizes far fewer CPU resources. More generally, we recommend that before researchers release research results, that they compare their supposedly sophisticated methods against simpler alternatives (e.g applying simpler learners to build local models). |
Tasks | |
Published | 2018-02-14 |
URL | http://arxiv.org/abs/1802.05319v1 |
http://arxiv.org/pdf/1802.05319v1.pdf | |
PWC | https://paperswithcode.com/paper/500-times-faster-than-deep-learning-a-case |
Repo | |
Framework | |
A Multi-task Neural Approach for Emotion Attribution, Classification and Summarization
Title | A Multi-task Neural Approach for Emotion Attribution, Classification and Summarization |
Authors | Guoyun Tu, Yanwei Fu, Boyang Li, Jiarui Gao, Yu-Gang Jiang, Xiangyang Xue |
Abstract | Emotional content is a crucial ingredient in user-generated videos. However, the sparsity of emotional expressions in the videos poses an obstacle to visual emotion analysis. In this paper, we propose a new neural approach, Bi-stream Emotion Attribution-Classification Network (BEAC-Net), to solve three related emotion analysis tasks: emotion recognition, emotion attribution, and emotion-oriented summarization, in a single integrated framework. BEAC-Net has two major constituents, an attribution network and a classification network. The attribution network extracts the main emotional segment that classification should focus on in order to mitigate the sparsity issue. The classification network utilizes both the extracted segment and the original video in a bi-stream architecture. We contribute a new dataset for the emotion attribution task with human-annotated ground-truth labels for emotion segments. Experiments on two video datasets demonstrate superior performance of the proposed framework and the complementary nature of the dual classification streams. |
Tasks | Emotion Recognition |
Published | 2018-12-21 |
URL | https://arxiv.org/abs/1812.09041v2 |
https://arxiv.org/pdf/1812.09041v2.pdf | |
PWC | https://paperswithcode.com/paper/a-multi-task-neural-approach-for-emotion |
Repo | |
Framework | |
Bayesian Optimal Design of Experiments For Inferring The Statistical Expectation Of A Black-Box Function
Title | Bayesian Optimal Design of Experiments For Inferring The Statistical Expectation Of A Black-Box Function |
Authors | Piyush Pandita, Ilias Bilionis, Jitesh Panchal |
Abstract | Bayesian optimal design of experiments (BODE) has been successful in acquiring information about a quantity of interest (QoI) which depends on a black-box function. BODE is characterized by sequentially querying the function at specific designs selected by an infill-sampling criterion. However, most current BODE methods operate in specific contexts like optimization, or learning a universal representation of the black-box function. The objective of this paper is to design a BODE for estimating the statistical expectation of a physical response surface. This QoI is omnipresent in uncertainty propagation and design under uncertainty problems. Our hypothesis is that an optimal BODE should be maximizing the expected information gain in the QoI. We represent the information gain from a hypothetical experiment as the Kullback-Liebler (KL) divergence between the prior and the posterior probability distributions of the QoI. The prior distribution of the QoI is conditioned on the observed data and the posterior distribution of the QoI is conditioned on the observed data and a hypothetical experiment. The main contribution of this paper is the derivation of a semi-analytic mathematical formula for the expected information gain about the statistical expectation of a physical response. The developed BODE is validated on synthetic functions with varying number of input-dimensions. We demonstrate the performance of the methodology on a steel wire manufacturing problem. |
Tasks | |
Published | 2018-07-26 |
URL | http://arxiv.org/abs/1807.09979v3 |
http://arxiv.org/pdf/1807.09979v3.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-optimal-design-of-experiments-for |
Repo | |
Framework | |
Age and Gender Classification From Ear Images
Title | Age and Gender Classification From Ear Images |
Authors | Dogucan Yaman, Fevziye Irem Eyiokur, Nurdan Sezgin, Hazım Kemal Ekenel |
Abstract | In this paper, we present a detailed analysis on extracting soft biometric traits, age and gender, from ear images. Although there have been a few previous work on gender classification using ear images, to the best of our knowledge, this study is the first work on age classification from ear images. In the study, we have utilized both geometric features and appearance-based features for ear representation. The utilized geometric features are based on eight anthropometric landmarks and consist of 14 distance measurements and two area calculations. The appearance-based methods employ deep convolutional neural networks for representation and classification. The well-known convolutional neural network models, namely, AlexNet, VGG-16, GoogLeNet, and SqueezeNet have been adopted for the study. They have been fine-tuned on a large-scale ear dataset that has been built from the profile and close-to-profile face images in the Multi-PIE face dataset. This way, we have performed a domain adaptation. The updated models have been fine-tuned once more time on the small-scale target ear dataset, which contains only around 270 ear images for training. According to the experimental results, appearance-based methods have been found to be superior to the methods based on geometric features. We have achieved 94% accuracy for gender classification, whereas 52% accuracy has been obtained for age classification. These results indicate that ear images provide useful cues for age and gender classification, however, further work is required for age estimation. |
Tasks | Age And Gender Classification, Age Estimation, Domain Adaptation |
Published | 2018-06-14 |
URL | http://arxiv.org/abs/1806.05742v1 |
http://arxiv.org/pdf/1806.05742v1.pdf | |
PWC | https://paperswithcode.com/paper/age-and-gender-classification-from-ear-images |
Repo | |
Framework | |
Shared Autonomy via Deep Reinforcement Learning
Title | Shared Autonomy via Deep Reinforcement Learning |
Authors | Siddharth Reddy, Anca D. Dragan, Sergey Levine |
Abstract | In shared autonomy, user input is combined with semi-autonomous control to achieve a common goal. The goal is often unknown ex-ante, so prior work enables agents to infer the goal from user input and assist with the task. Such methods tend to assume some combination of knowledge of the dynamics of the environment, the user’s policy given their goal, and the set of possible goals the user might target, which limits their application to real-world scenarios. We propose a deep reinforcement learning framework for model-free shared autonomy that lifts these assumptions. We use human-in-the-loop reinforcement learning with neural network function approximation to learn an end-to-end mapping from environmental observation and user input to agent action values, with task reward as the only form of supervision. This approach poses the challenge of following user commands closely enough to provide the user with real-time action feedback and thereby ensure high-quality user input, but also deviating from the user’s actions when they are suboptimal. We balance these two needs by discarding actions whose values fall below some threshold, then selecting the remaining action closest to the user’s input. Controlled studies with users (n = 12) and synthetic pilots playing a video game, and a pilot study with users (n = 4) flying a real quadrotor, demonstrate the ability of our algorithm to assist users with real-time control tasks in which the agent cannot directly access the user’s private information through observations, but receives a reward signal and user input that both depend on the user’s intent. The agent learns to assist the user without access to this private information, implicitly inferring it from the user’s input. This paper is a proof of concept that illustrates the potential for deep reinforcement learning to enable flexible and practical assistive systems. |
Tasks | |
Published | 2018-02-06 |
URL | http://arxiv.org/abs/1802.01744v2 |
http://arxiv.org/pdf/1802.01744v2.pdf | |
PWC | https://paperswithcode.com/paper/shared-autonomy-via-deep-reinforcement |
Repo | |
Framework | |
Evaluating Hospital Case Cost Prediction Models Using Azure Machine Learning Studio
Title | Evaluating Hospital Case Cost Prediction Models Using Azure Machine Learning Studio |
Authors | Alexei Botchkarev |
Abstract | Ability for accurate hospital case cost modelling and prediction is critical for efficient health care financial management and budgetary planning. A variety of regression machine learning algorithms are known to be effective for health care cost predictions. The purpose of this experiment was to build an Azure Machine Learning Studio tool for rapid assessment of multiple types of regression models. The tool offers environment for comparing 14 types of regression models in a unified experiment: linear regression, Bayesian linear regression, decision forest regression, boosted decision tree regression, neural network regression, Poisson regression, Gaussian processes for regression, gradient boosted machine, nonlinear least squares regression, projection pursuit regression, random forest regression, robust regression, robust regression with mm-type estimators, support vector regression. The tool presents assessment results arranged by model accuracy in a single table using five performance metrics. Evaluation of regression machine learning models for performing hospital case cost prediction demonstrated advantage of robust regression model, boosted decision tree regression and decision forest regression. The operational tool has been published to the web and openly available for experiments and extensions. |
Tasks | Gaussian Processes |
Published | 2018-04-04 |
URL | http://arxiv.org/abs/1804.01825v2 |
http://arxiv.org/pdf/1804.01825v2.pdf | |
PWC | https://paperswithcode.com/paper/evaluating-hospital-case-cost-prediction |
Repo | |
Framework | |
Cross-domain CNN for Hyperspectral Image Classification
Title | Cross-domain CNN for Hyperspectral Image Classification |
Authors | Hyungtae Lee, Sungmin Eum, Heesung Kwon |
Abstract | In this paper, we address the dataset scarcity issue with the hyperspectral image classification. As only a few thousands of pixels are available for training, it is difficult to effectively learn high-capacity Convolutional Neural Networks (CNNs). To cope with this problem, we propose a novel cross-domain CNN containing the shared parameters which can co-learn across multiple hyperspectral datasets. The network also contains the non-shared portions designed to handle the dataset specific spectral characteristics and the associated classification tasks. Our approach is the first attempt to learn a CNN for multiple hyperspectral datasets, in an end-to-end fashion. Moreover, we have experimentally shown that the proposed network trained on three of the widely used datasets outperform all the baseline networks which are trained on single dataset. |
Tasks | Hyperspectral Image Classification, Image Classification |
Published | 2018-01-31 |
URL | http://arxiv.org/abs/1802.00093v2 |
http://arxiv.org/pdf/1802.00093v2.pdf | |
PWC | https://paperswithcode.com/paper/cross-domain-cnn-for-hyperspectral-image |
Repo | |
Framework | |
The Intriguing Properties of Model Explanations
Title | The Intriguing Properties of Model Explanations |
Authors | Maruan Al-Shedivat, Avinava Dubey, Eric P. Xing |
Abstract | Linear approximations to the decision boundary of a complex model have become one of the most popular tools for interpreting predictions. In this paper, we study such linear explanations produced either post-hoc by a few recent methods or generated along with predictions with contextual explanation networks (CENs). We focus on two questions: (i) whether linear explanations are always consistent or can be misleading, and (ii) when integrated into the prediction process, whether and how explanations affect the performance of the model. Our analysis sheds more light on certain properties of explanations produced by different methods and suggests that learning models that explain and predict jointly is often advantageous. |
Tasks | |
Published | 2018-01-30 |
URL | http://arxiv.org/abs/1801.09808v1 |
http://arxiv.org/pdf/1801.09808v1.pdf | |
PWC | https://paperswithcode.com/paper/the-intriguing-properties-of-model |
Repo | |
Framework | |
Towards Deeper Understanding of Nonconvex Stochastic Optimization with Momentum using Diffusion Approximations
Title | Towards Deeper Understanding of Nonconvex Stochastic Optimization with Momentum using Diffusion Approximations |
Authors | Tianyi Liu, Zhehui Chen, Enlu Zhou, Tuo Zhao |
Abstract | Momentum Stochastic Gradient Descent (MSGD) algorithm has been widely applied to many nonconvex optimization problems in machine learning, e.g., training deep neural networks, variational Bayesian inference, and etc. Due to current technical limit, however, establishing convergence properties of MSGD for these highly complicated nonconvex problems is generally infeasible. Therefore, we propose to analyze the algorithm through a simpler but nontrivial nonconvex problem — streaming PCA. This allows us to make progress toward understanding MSGD and gaining new insights for more general problems. Specifically, by applying diffusion approximations, our study shows that the momentum helps escape from saddle points, but hurts the convergence within the neighborhood of optima (if without the step size annealing). Our theoretical discovery partially corroborates the empirical successes of MSGD in training deep neural networks. Moreover, our analysis applies the martingale method and “Fixed-State-Chain” method from the stochastic approximation literature, which are of independent interest. |
Tasks | Bayesian Inference, Dimensionality Reduction, Stochastic Optimization |
Published | 2018-02-14 |
URL | https://arxiv.org/abs/1802.05155v4 |
https://arxiv.org/pdf/1802.05155v4.pdf | |
PWC | https://paperswithcode.com/paper/toward-deeper-understanding-of-nonconvex |
Repo | |
Framework | |
A Nonparametric Delayed Feedback Model for Conversion Rate Prediction
Title | A Nonparametric Delayed Feedback Model for Conversion Rate Prediction |
Authors | Yuya Yoshikawa, Yusaku Imai |
Abstract | Predicting conversion rates (CVRs) in display advertising (e.g., predicting the proportion of users who purchase an item (i.e., a conversion) after its corresponding ad is clicked) is important when measuring the effects of ads shown to users and to understanding the interests of the users. There is generally a time delay (i.e., so-called {\it delayed feedback}) between the ad click and conversion. Owing to the delayed feedback, samples that are converted after an observation period may be treated as negative. To overcome this drawback, CVR prediction assuming that the time delay follows an exponential distribution has been proposed. In practice, however, there is no guarantee that the delay is generated from the exponential distribution, and the best distribution with which to represent the delay depends on the data. In this paper, we propose a nonparametric delayed feedback model for CVR prediction that represents the distribution of the time delay without assuming a parametric distribution, such as an exponential or Weibull distribution. Because the distribution of the time delay is modeled depending on the content of an ad and the features of a user, various shapes of the distribution can be represented potentially. In experiments, we show that the proposed model can capture the distribution for the time delay on a synthetic dataset, even when the distribution is complicated. Moreover, on a real dataset, we show that the proposed model outperforms the existing method that assumes an exponential distribution for the time delay in terms of conversion rate prediction. |
Tasks | |
Published | 2018-02-01 |
URL | http://arxiv.org/abs/1802.00255v1 |
http://arxiv.org/pdf/1802.00255v1.pdf | |
PWC | https://paperswithcode.com/paper/a-nonparametric-delayed-feedback-model-for |
Repo | |
Framework | |
Image Retrieval with Mixed Initiative and Multimodal Feedback
Title | Image Retrieval with Mixed Initiative and Multimodal Feedback |
Authors | Nils Murrugarra-Llerena, Adriana Kovashka |
Abstract | How would you search for a unique, fashionable shoe that a friend wore and you want to buy, but you didn’t take a picture? Existing approaches propose interactive image search as a promising venue. However, they either entrust the user with taking the initiative to provide informative feedback, or give all control to the system which determines informative questions to ask. Instead, we propose a mixed-initiative framework where both the user and system can be active participants, depending on whose initiative will be more beneficial for obtaining high-quality search results. We develop a reinforcement learning approach which dynamically decides which of three interaction opportunities to give to the user: drawing a sketch, providing free-form attribute feedback, or answering attribute-based questions. By allowing these three options, our system optimizes both the informativeness and exploration capabilities allowing faster image retrieval. We outperform three baselines on three datasets and extensive experimental settings. |
Tasks | Image Retrieval |
Published | 2018-05-08 |
URL | http://arxiv.org/abs/1805.03134v1 |
http://arxiv.org/pdf/1805.03134v1.pdf | |
PWC | https://paperswithcode.com/paper/image-retrieval-with-mixed-initiative-and |
Repo | |
Framework | |
Hardware Conditioned Policies for Multi-Robot Transfer Learning
Title | Hardware Conditioned Policies for Multi-Robot Transfer Learning |
Authors | Tao Chen, Adithyavairavan Murali, Abhinav Gupta |
Abstract | Deep reinforcement learning could be used to learn dexterous robotic policies but it is challenging to transfer them to new robots with vastly different hardware properties. It is also prohibitively expensive to learn a new policy from scratch for each robot hardware due to the high sample complexity of modern state-of-the-art algorithms. We propose a novel approach called \textit{Hardware Conditioned Policies} where we train a universal policy conditioned on a vector representation of robot hardware. We considered robots in simulation with varied dynamics, kinematic structure, kinematic lengths and degrees-of-freedom. First, we use the kinematic structure directly as the hardware encoding and show great zero-shot transfer to completely novel robots not seen during training. For robots with lower zero-shot success rate, we also demonstrate that fine-tuning the policy network is significantly more sample-efficient than training a model from scratch. In tasks where knowing the agent dynamics is important for success, we learn an embedding for robot hardware and show that policies conditioned on the encoding of hardware tend to generalize and transfer well. The code and videos are available on the project webpage: https://sites.google.com/view/robot-transfer-hcp. |
Tasks | Transfer Learning |
Published | 2018-11-24 |
URL | http://arxiv.org/abs/1811.09864v2 |
http://arxiv.org/pdf/1811.09864v2.pdf | |
PWC | https://paperswithcode.com/paper/hardware-conditioned-policies-for-multi-robot |
Repo | |
Framework | |
Provable limitations of deep learning
Title | Provable limitations of deep learning |
Authors | Emmanuel Abbe, Colin Sandon |
Abstract | As the success of deep learning reaches more grounds, one would like to also envision the potential limits of deep learning. This paper gives a first set of results proving that certain deep learning algorithms fail at learning certain efficiently learnable functions. The results put forward a notion of cross-predictability that characterizes when such failures take place. Parity functions provide an extreme example with a cross-predictability that decays exponentially, while a mere super-polynomial decay of the cross-predictability is shown to be sufficient to obtain failures. Examples in community detection and arithmetic learning are also discussed. Recall that it is known that the class of neural networks (NNs) with polynomial network size can express any function that can be implemented in polynomial time, and that their sample complexity scales polynomially with the network size. The challenge is with the optimization error (the ERM is NP-hard), and the success behind deep learning is to train deep NNs with descent algorithms. The failures shown in this paper apply to training poly-size NNs on function distributions of low cross-predictability with a descent algorithm that is either run with limited memory per sample or that is initialized and run with enough randomness. We further claim that such types of constraints are necessary to obtain failures, in that exact SGD with careful non-random initialization can be shown to learn parities. The cross-predictability in our results plays a similar role the statistical dimension in statistical query (SQ) algorithms, with distinctions explained in the paper. The proof techniques are based on exhibiting algorithmic constraints that imply a statistical indistinguishability between the algorithm’s output on the test model v.s.\ a null model, using information measures to bound the total variation distance. |
Tasks | Community Detection |
Published | 2018-12-16 |
URL | http://arxiv.org/abs/1812.06369v2 |
http://arxiv.org/pdf/1812.06369v2.pdf | |
PWC | https://paperswithcode.com/paper/provable-limitations-of-deep-learning |
Repo | |
Framework | |
A Hierarchical Attention Model for Social Contextual Image Recommendation
Title | A Hierarchical Attention Model for Social Contextual Image Recommendation |
Authors | Le Wu, Lei Chen, Richang Hong, Yanjie Fu, Xing Xie, Meng Wang |
Abstract | Image based social networks are among the most popular social networking services in recent years. With tremendous images uploaded everyday, understanding users’ preferences on user-generated images and making recommendations have become an urgent need. In fact, many hybrid models have been proposed to fuse various kinds of side information~(e.g., image visual representation, social network) and user-item historical behavior for enhancing recommendation performance. However, due to the unique characteristics of the user generated images in social image platforms, the previous studies failed to capture the complex aspects that influence users’ preferences in a unified framework. Moreover, most of these hybrid models relied on predefined weights in combining different kinds of information, which usually resulted in sub-optimal recommendation performance. To this end, in this paper, we develop a hierarchical attention model for social contextual image recommendation. In addition to basic latent user interest modeling in the popular matrix factorization based recommendation, we identify three key aspects (i.e., upload history, social influence, and owner admiration) that affect each user’s latent preferences, where each aspect summarizes a contextual factor from the complex relationships between users and images. After that, we design a hierarchical attention network that naturally mirrors the hierarchical relationship (elements in each aspects level, and the aspect level) of users’ latent interests with the identified key aspects. Specifically, by taking embeddings from state-of-the-art deep learning models that are tailored for each kind of data, the hierarchical attention network could learn to attend differently to more or less content. Finally, extensive experimental results on real-world datasets clearly show the superiority of our proposed model. |
Tasks | |
Published | 2018-06-03 |
URL | http://arxiv.org/abs/1806.00723v3 |
http://arxiv.org/pdf/1806.00723v3.pdf | |
PWC | https://paperswithcode.com/paper/a-hierarchical-attention-model-for-social |
Repo | |
Framework | |
Creating a New Persian Poet Based on Machine Learning
Title | Creating a New Persian Poet Based on Machine Learning |
Authors | Mehdi Hosseini Moghadam, Bardia Panahbehagh |
Abstract | In this article we describe an application of Machine Learning (ML) and Linguistic Modeling to generate persian poems. In fact we teach machine by reading and learning persian poems to generate fake poems in the same style of the original poems. As two well known poets we used Hafez (1310-1390) and Saadi (1210-1292) poems. First we feed the machine with Hafez poems to generate fake poems with the same style and then we feed the machine with the both Hafez and Saadi poems to generate a new style poems which is combination of these two poets styles with emotional (Hafez) and rational (Saadi) elements. This idea of combination of different styles with ML opens new gates for extending the treasure of past literature of different cultures. Results show with enough memory, processing power and time it is possible to generate reasonable good poems. |
Tasks | |
Published | 2018-10-16 |
URL | http://arxiv.org/abs/1810.06898v1 |
http://arxiv.org/pdf/1810.06898v1.pdf | |
PWC | https://paperswithcode.com/paper/creating-a-new-persian-poet-based-on-machine |
Repo | |
Framework | |