Paper Group ANR 595
Reasoning with Sarcasm by Reading In-between. Data-driven satisficing measure and ranking. Inferring the size of the causal universe: features and fusion of causal attribution networks. Discovering hierarchies using Imitation Learning from hierarchy aware policies. Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged So …
Reasoning with Sarcasm by Reading In-between
Title | Reasoning with Sarcasm by Reading In-between |
Authors | Yi Tay, Luu Anh Tuan, Siu Cheung Hui, Jian Su |
Abstract | Sarcasm is a sophisticated speech act which commonly manifests on social communities such as Twitter and Reddit. The prevalence of sarcasm on the social web is highly disruptive to opinion mining systems due to not only its tendency of polarity flipping but also usage of figurative language. Sarcasm commonly manifests with a contrastive theme either between positive-negative sentiments or between literal-figurative scenarios. In this paper, we revisit the notion of modeling contrast in order to reason with sarcasm. More specifically, we propose an attention-based neural model that looks in-between instead of across, enabling it to explicitly model contrast and incongruity. We conduct extensive experiments on six benchmark datasets from Twitter, Reddit and the Internet Argument Corpus. Our proposed model not only achieves state-of-the-art performance on all datasets but also enjoys improved interpretability. |
Tasks | Sarcasm Detection |
Published | 2018-05-08 |
URL | http://arxiv.org/abs/1805.02856v1 |
http://arxiv.org/pdf/1805.02856v1.pdf | |
PWC | https://paperswithcode.com/paper/reasoning-with-sarcasm-by-reading-in-between |
Repo | |
Framework | |
Data-driven satisficing measure and ranking
Title | Data-driven satisficing measure and ranking |
Authors | Wenjie Huang |
Abstract | We propose an computational framework for real-time risk assessment and prioritizing for random outcomes without prior information on probability distributions. The basic model is built based on satisficing measure (SM) which yields a single index for risk comparison. Since SM is a dual representation for a family of risk measures, we consider problems constrained by general convex risk measures and specifically by Conditional value-at-risk. Starting from offline optimization, we apply sample average approximation technique and argue the convergence rate and validation of optimal solutions. In online stochastic optimization case, we develop primal-dual stochastic approximation algorithms respectively for general risk constrained problems, and derive their regret bounds. For both offline and online cases, we illustrate the relationship between risk ranking accuracy with sample size (or iterations). |
Tasks | Stochastic Optimization |
Published | 2018-07-01 |
URL | http://arxiv.org/abs/1807.00325v1 |
http://arxiv.org/pdf/1807.00325v1.pdf | |
PWC | https://paperswithcode.com/paper/data-driven-satisficing-measure-and-ranking |
Repo | |
Framework | |
Inferring the size of the causal universe: features and fusion of causal attribution networks
Title | Inferring the size of the causal universe: features and fusion of causal attribution networks |
Authors | Daniel Berenberg, James P. Bagrow |
Abstract | Cause-and-effect reasoning, the attribution of effects to causes, is one of the most powerful and unique skills humans possess. Multiple surveys are mapping out causal attributions as networks, but it is unclear how well these efforts can be combined. Further, the total size of the collective causal attribution network held by humans is currently unknown, making it challenging to assess the progress of these surveys. Here we study three causal attribution networks to determine how well they can be combined into a single network. Combining these networks requires dealing with ambiguous nodes, as nodes represent written descriptions of causes and effects and different descriptions may exist for the same concept. We introduce NetFUSES, a method for combining networks with ambiguous nodes. Crucially, treating the different causal attributions networks as independent samples allows us to use their overlap to estimate the total size of the collective causal attribution network. We find that existing surveys capture 5.77% $\pm$ 0.781% of the $\approx$293 000 causes and effects estimated to exist, and 0.198% $\pm$ 0.174% of the $\approx$10 200 000 attributed cause-effect relationships. |
Tasks | |
Published | 2018-12-14 |
URL | http://arxiv.org/abs/1812.06038v1 |
http://arxiv.org/pdf/1812.06038v1.pdf | |
PWC | https://paperswithcode.com/paper/inferring-the-size-of-the-causal-universe |
Repo | |
Framework | |
Discovering hierarchies using Imitation Learning from hierarchy aware policies
Title | Discovering hierarchies using Imitation Learning from hierarchy aware policies |
Authors | Ameet Deshpande, Harshavardhan Kamarthi, Balaraman Ravindran |
Abstract | Learning options that allow agents to exhibit temporally higher order behavior has proven to be useful in increasing exploration, reducing sample complexity and for various transfer scenarios. Deep Discovery of Options (DDO) is a generative algorithm that learns a hierarchical policy along with options directly from expert trajectories. We perform a qualitative and quantitative analysis of options inferred from DDO in different domains. To this end, we suggest different value metrics like option termination condition, hinge value function error and KL-Divergence based distance metric to compare different methods. Analyzing the termination condition of the options and number of time steps the options were run revealed that the options were terminating prematurely. We suggest modifications which can be incorporated easily and alleviates the problem of shorter options and a collapse of options to the same mode. |
Tasks | Imitation Learning |
Published | 2018-12-01 |
URL | https://arxiv.org/abs/1812.00225v2 |
https://arxiv.org/pdf/1812.00225v2.pdf | |
PWC | https://paperswithcode.com/paper/discovering-hierarchies-using-imitation |
Repo | |
Framework | |
Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions
Title | Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions |
Authors | Zaiyi Chen, Zhuoning Yuan, Jinfeng Yi, Bowen Zhou, Enhong Chen, Tianbao Yang |
Abstract | Although stochastic gradient descent (SGD) method and its variants (e.g., stochastic momentum methods, AdaGrad) are the choice of algorithms for solving non-convex problems (especially deep learning), there still remain big gaps between the theory and the practice with many questions unresolved. For example, there is still a lack of theories of convergence for SGD and its variants that use stagewise step size and return an averaged solution in practice. In addition, theoretical insights of why adaptive step size of AdaGrad could improve non-adaptive step size of {\sgd} is still missing for non-convex optimization. This paper aims to address these questions and fill the gap between theory and practice. We propose a universal stagewise optimization framework for a broad family of {\bf non-smooth non-convex} (namely weakly convex) problems with the following key features: (i) at each stage any suitable stochastic convex optimization algorithms (e.g., SGD or AdaGrad) that return an averaged solution can be employed for minimizing a regularized convex problem; (ii) the step size is decreased in a stagewise manner; (iii) an averaged solution is returned as the final solution that is selected from all stagewise averaged solutions with sampling probabilities {\it increasing} as the stage number. Our theoretical results of stagewise AdaGrad exhibit its adaptive convergence, therefore shed insights on its faster convergence for problems with sparse stochastic gradients than stagewise SGD. To the best of our knowledge, these new results are the first of their kind for addressing the unresolved issues of existing theories mentioned earlier. Besides theoretical contributions, our empirical studies show that our stagewise SGD and ADAGRAD improve the generalization performance of existing variants/implementations of SGD and ADAGRAD. |
Tasks | |
Published | 2018-08-20 |
URL | http://arxiv.org/abs/1808.06296v3 |
http://arxiv.org/pdf/1808.06296v3.pdf | |
PWC | https://paperswithcode.com/paper/universal-stagewise-learning-for-non-convex |
Repo | |
Framework | |
Making effective use of healthcare data using data-to-text technology
Title | Making effective use of healthcare data using data-to-text technology |
Authors | Steffen Pauws, Albert Gatt, Emiel Krahmer, Ehud Reiter |
Abstract | Healthcare organizations are in a continuous effort to improve health outcomes, reduce costs and enhance patient experience of care. Data is essential to measure and help achieving these improvements in healthcare delivery. Consequently, a data influx from various clinical, financial and operational sources is now overtaking healthcare organizations and their patients. The effective use of this data, however, is a major challenge. Clearly, text is an important medium to make data accessible. Financial reports are produced to assess healthcare organizations on some key performance indicators to steer their healthcare delivery. Similarly, at a clinical level, data on patient status is conveyed by means of textual descriptions to facilitate patient review, shift handover and care transitions. Likewise, patients are informed about data on their health status and treatments via text, in the form of reports or via ehealth platforms by their doctors. Unfortunately, such text is the outcome of a highly labour-intensive process if it is done by healthcare professionals. It is also prone to incompleteness, subjectivity and hard to scale up to different domains, wider audiences and varying communication purposes. Data-to-text is a recent breakthrough technology in artificial intelligence which automatically generates natural language in the form of text or speech from data. This chapter provides a survey of data-to-text technology, with a focus on how it can be deployed in a healthcare setting. It will (1) give an up-to-date synthesis of data-to-text approaches, (2) give a categorized overview of use cases in healthcare, (3) seek to make a strong case for evaluating and implementing data-to-text in a healthcare setting, and (4) highlight recent research challenges. |
Tasks | |
Published | 2018-08-10 |
URL | http://arxiv.org/abs/1808.03507v1 |
http://arxiv.org/pdf/1808.03507v1.pdf | |
PWC | https://paperswithcode.com/paper/making-effective-use-of-healthcare-data-using |
Repo | |
Framework | |
Content-driven, unsupervised clustering of news articles through multiscale graph partitioning
Title | Content-driven, unsupervised clustering of news articles through multiscale graph partitioning |
Authors | M. Tarik Altuncu, Sophia N. Yaliraki, Mauricio Barahona |
Abstract | The explosion in the amount of news and journalistic content being generated across the globe, coupled with extended and instantaneous access to information through online media, makes it difficult and time-consuming to monitor news developments and opinion formation in real time. There is an increasing need for tools that can pre-process, analyse and classify raw text to extract interpretable content; specifically, identifying topics and content-driven groupings of articles. We present here such a methodology that brings together powerful vector embeddings from Natural Language Processing with tools from Graph Theory that exploit diffusive dynamics on graphs to reveal natural partitions across scales. Our framework uses a recent deep neural network text analysis methodology (Doc2vec) to represent text in vector form and then applies a multi-scale community detection method (Markov Stability) to partition a similarity graph of document vectors. The method allows us to obtain clusters of documents with similar content, at different levels of resolution, in an unsupervised manner. We showcase our approach with the analysis of a corpus of 9,000 news articles published by Vox Media over one year. Our results show consistent groupings of documents according to content without a priori assumptions about the number or type of clusters to be found. The multilevel clustering reveals a quasi-hierarchy of topics and subtopics with increased intelligibility and improved topic coherence as compared to external taxonomy services and standard topic detection methods. |
Tasks | Community Detection, graph partitioning |
Published | 2018-08-03 |
URL | http://arxiv.org/abs/1808.01175v1 |
http://arxiv.org/pdf/1808.01175v1.pdf | |
PWC | https://paperswithcode.com/paper/content-driven-unsupervised-clustering-of |
Repo | |
Framework | |
Multi-view Hybrid Embedding: A Divide-and-Conquer Approach
Title | Multi-view Hybrid Embedding: A Divide-and-Conquer Approach |
Authors | Jiamiao Xu, Shujian Yu, Xinge You, Mengjun Leng, Xiao-Yuan Jing, C. L. Philip Chen |
Abstract | We present a novel cross-view classification algorithm where the gallery and probe data come from different views. A popular approach to tackle this problem is the multi-view subspace learning (MvSL) that aims to learn a latent subspace shared by multi-view data. Despite promising results obtained on some applications, the performance of existing methods deteriorates dramatically when the multi-view data is sampled from nonlinear manifolds or suffers from heavy outliers. To circumvent this drawback, motivated by the Divide-and-Conquer strategy, we propose Multi-view Hybrid Embedding (MvHE), a unique method of dividing the problem of cross-view classification into three subproblems and building one model for each subproblem. Specifically, the first model is designed to remove view discrepancy, whereas the second and third models attempt to discover the intrinsic nonlinear structure and to increase discriminability in intra-view and inter-view samples respectively. The kernel extension is conducted to further boost the representation power of MvHE. Extensive experiments are conducted on four benchmark datasets. Our methods demonstrate overwhelming advantages against the state-of-the-art MvSL based cross-view classification approaches in terms of classification accuracy and robustness. |
Tasks | |
Published | 2018-04-19 |
URL | http://arxiv.org/abs/1804.07237v2 |
http://arxiv.org/pdf/1804.07237v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-view-hybrid-embedding-a-divide-and |
Repo | |
Framework | |
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks
Title | Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks |
Authors | Difan Zou, Yuan Cao, Dongruo Zhou, Quanquan Gu |
Abstract | We study the problem of training deep neural networks with Rectified Linear Unit (ReLU) activation function using gradient descent and stochastic gradient descent. In particular, we study the binary classification problem and show that for a broad family of loss functions, with proper random weight initialization, both gradient descent and stochastic gradient descent can find the global minima of the training loss for an over-parameterized deep ReLU network, under mild assumption on the training data. The key idea of our proof is that Gaussian random initialization followed by (stochastic) gradient descent produces a sequence of iterates that stay inside a small perturbation region centering around the initial weights, in which the empirical loss function of deep ReLU networks enjoys nice local curvature properties that ensure the global convergence of (stochastic) gradient descent. Our theoretical results shed light on understanding the optimization for deep learning, and pave the way for studying the optimization dynamics of training modern deep neural networks. |
Tasks | |
Published | 2018-11-21 |
URL | http://arxiv.org/abs/1811.08888v3 |
http://arxiv.org/pdf/1811.08888v3.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-gradient-descent-optimizes-over |
Repo | |
Framework | |
Financial Aspect-Based Sentiment Analysis using Deep Representations
Title | Financial Aspect-Based Sentiment Analysis using Deep Representations |
Authors | Steve Yang, Jason Rosenfeld, Jacques Makutonin |
Abstract | The topic of aspect-based sentiment analysis (ABSA) has been explored for a variety of industries, but it still remains much unexplored in finance. The recent release of data for an open challenge (FiQA) from the companion proceedings of WWW ‘18 has provided valuable finance-specific annotations. FiQA contains high quality labels, but it still lacks data quantity to apply traditional ABSA deep learning architecture. In this paper, we employ high-level semantic representations and methods of inductive transfer learning for NLP. We experiment with extensions of recently developed domain adaptation methods and target task fine-tuning that significantly improve performance on a small dataset. Our results show an 8.7% improvement in the F1 score for classification and an 11% improvement over the MSE for regression on current state-of-the-art results. |
Tasks | Aspect-Based Sentiment Analysis, Domain Adaptation, Sentiment Analysis, Transfer Learning |
Published | 2018-08-23 |
URL | http://arxiv.org/abs/1808.07931v1 |
http://arxiv.org/pdf/1808.07931v1.pdf | |
PWC | https://paperswithcode.com/paper/financial-aspect-based-sentiment-analysis |
Repo | |
Framework | |
Curve Registered Coupled Low Rank Factorization
Title | Curve Registered Coupled Low Rank Factorization |
Authors | Jeremy Emile Cohen, Rodrigo Cabral Farias, Bertrand Rivet |
Abstract | We propose an extension of the canonical polyadic (CP) tensor model where one of the latent factors is allowed to vary through data slices in a constrained way. The components of the latent factors, which we want to retrieve from data, can vary from one slice to another up to a diffeomorphism. We suppose that the diffeomorphisms are also unknown, thus merging curve registration and tensor decomposition in one model, which we call registered CP. We present an algorithm to retrieve both the latent factors and the diffeomorphism, which is assumed to be in a parametrized form. At the end of the paper, we show simulation results comparing registered CP with other models from the literature. |
Tasks | |
Published | 2018-02-09 |
URL | http://arxiv.org/abs/1802.03203v1 |
http://arxiv.org/pdf/1802.03203v1.pdf | |
PWC | https://paperswithcode.com/paper/curve-registered-coupled-low-rank |
Repo | |
Framework | |
ProofWatch: Watchlist Guidance for Large Theories in E
Title | ProofWatch: Watchlist Guidance for Large Theories in E |
Authors | Zarathustra Goertzel, Jan Jakubův, Stephan Schulz, Josef Urban |
Abstract | Watchlist (also hint list) is a mechanism that allows related proofs to guide a proof search for a new conjecture. This mechanism has been used with the Otter and Prover9 theorem provers, both for interactive formalizations and for human-assisted proving of open conjectures in small theories. In this work we explore the use of watchlists in large theories coming from first-order translations of large ITP libraries, aiming at improving hammer-style automation by smarter internal guidance of the ATP systems. In particular, we (i) design watchlist-based clause evaluation heuristics inside the E ATP system, and (ii) develop new proof guiding algorithms that load many previous proofs inside the ATP and focus the proof search using a dynamically updated notion of proof matching. The methods are evaluated on a large set of problems coming from the Mizar library, showing significant improvement of E’s standard portfolio of strategies, and also of the previous best set of strategies invented for Mizar by evolutionary methods. |
Tasks | |
Published | 2018-02-12 |
URL | http://arxiv.org/abs/1802.04007v2 |
http://arxiv.org/pdf/1802.04007v2.pdf | |
PWC | https://paperswithcode.com/paper/proofwatch-watchlist-guidance-for-large |
Repo | |
Framework | |
AIR5: Five Pillars of Artificial Intelligence Research
Title | AIR5: Five Pillars of Artificial Intelligence Research |
Authors | Yew-Soon Ong, Abhishek Gupta |
Abstract | In this article, we provide and overview of what we consider to be some of the most pressing research questions facing the fields of artificial intelligence (AI) and computational intelligence (CI); with the latter focusing on algorithms that are inspired by various natural phenomena. We demarcate these questions using five unique Rs - namely, (i) rationalizability, (ii) resilience, (iii) reproducibility, (iv) realism, and (v) responsibility. Notably, just as air serves as the basic element of biological life, the term AIR5 - cumulatively referring to the five aforementioned Rs - is introduced herein to mark some of the basic elements of artificial life (supporting the sustained growth of AI and CI). A brief summary of each of the Rs is presented, highlighting their relevance as pillars of future research in this arena. |
Tasks | Artificial Life |
Published | 2018-12-30 |
URL | http://arxiv.org/abs/1812.11509v2 |
http://arxiv.org/pdf/1812.11509v2.pdf | |
PWC | https://paperswithcode.com/paper/air5-five-pillars-of-artificial-intelligence |
Repo | |
Framework | |
Neural Regression Trees
Title | Neural Regression Trees |
Authors | Shahan Ali Memon, Wenbo Zhao, Bhiksha Raj, Rita Singh |
Abstract | Regression-via-Classification (RvC) is the process of converting a regression problem to a classification one. Current approaches for RvC use ad-hoc discretization strategies and are suboptimal. We propose a neural regression tree model for RvC. In this model, we employ a joint optimization framework where we learn optimal discretization thresholds while simultaneously optimizing the features for each node in the tree. We empirically show the validity of our model by testing it on two challenging regression tasks where we establish the state of the art. |
Tasks | |
Published | 2018-10-01 |
URL | http://arxiv.org/abs/1810.00974v2 |
http://arxiv.org/pdf/1810.00974v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-regression-trees |
Repo | |
Framework | |
Towards Complex Artificial Life
Title | Towards Complex Artificial Life |
Authors | Lance R. Williams |
Abstract | An object-oriented combinator chemistry was used to construct an artificial organism with a system architecture possessing characteristics necessary for organisms to evolve into more complex forms. This architecture supports modularity by providing a mechanism for the construction of executable modules called $methods$ that can be duplicated and specialized to increase complexity. At the same time, its support for concurrency provides the flexibility in execution order necessary for redundancy, degeneracy and parallelism to mitigate increased replication costs. The organism is a moving, self-replicating, spatially distributed assembly of elemental combinators called a $roving : pile.$ The pile hosts an asynchronous message passing computation implemented by parallel subprocesses encoded by genes distributed through out the pile like the plasmids of a bacterial cell. |
Tasks | Artificial Life |
Published | 2018-05-16 |
URL | http://arxiv.org/abs/1805.06366v1 |
http://arxiv.org/pdf/1805.06366v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-complex-artificial-life |
Repo | |
Framework | |