May 5, 2019

2700 words 13 mins read

Paper Group ANR 566

A New Spectral Method for Latent Variable Models. Knowledge Graph Representation with Jointly Structural and Textual Encoding. Generating Semi-Synthetic Validation Benchmarks for Embryomics. The Singularity Controversy, Part I: Lessons Learned and Open Questions: Conclusions from the Battle on the Legitimacy of the Debate. A Comparative Analysis of …

A New Spectral Method for Latent Variable Models


Title	A New Spectral Method for Latent Variable Models
Authors	Matteo Ruffini, Marta Casanellas, Ricard Gavaldà
Abstract	This paper presents an algorithm for the unsupervised learning of latent variable models from unlabeled sets of data. We base our technique on spectral decomposition, providing a technique that proves to be robust both in theory and in practice. We also describe how to use this algorithm to learn the parameters of two well known text mining models: single topic model and Latent Dirichlet Allocation, providing in both cases an efficient technique to retrieve the parameters to feed the algorithm. We compare the results of our algorithm with those of existing algorithms on synthetic data, and we provide examples of applications to real world text corpora for both single topic model and LDA, obtaining meaningful results.
Tasks	Latent Variable Models
Published	2016-12-11
URL	http://arxiv.org/abs/1612.03409v2
PDF	http://arxiv.org/pdf/1612.03409v2.pdf
PWC	https://paperswithcode.com/paper/a-new-spectral-method-for-latent-variable
Repo
Framework

Knowledge Graph Representation with Jointly Structural and Textual Encoding


Title	Knowledge Graph Representation with Jointly Structural and Textual Encoding
Authors	Jiacheng Xu, Kan Chen, Xipeng Qiu, Xuanjing Huang
Abstract	The objective of knowledge graph embedding is to encode both entities and relations of knowledge graphs into continuous low-dimensional vector spaces. Previously, most works focused on symbolic representation of knowledge graph with structure information, which can not handle new entities or entities with few facts well. In this paper, we propose a novel deep architecture to utilize both structural and textual information of entities. Specifically, we introduce three neural models to encode the valuable information from text description of entity, among which an attentive model can select related information as needed. Then, a gating mechanism is applied to integrate representations of structure and text into a unified architecture. Experiments show that our models outperform baseline by margin on link prediction and triplet classification tasks. Source codes of this paper will be available on Github.
Tasks	Graph Embedding, Knowledge Graph Embedding, Knowledge Graphs, Link Prediction
Published	2016-11-26
URL	http://arxiv.org/abs/1611.08661v2
PDF	http://arxiv.org/pdf/1611.08661v2.pdf
PWC	https://paperswithcode.com/paper/knowledge-graph-representation-with-jointly
Repo
Framework

Generating Semi-Synthetic Validation Benchmarks for Embryomics


Title	Generating Semi-Synthetic Validation Benchmarks for Embryomics
Authors	Johannes Stegmaier, Julian Arz, Benjamin Schott, Jens C. Otte, Andrei Kobitski, G. Ulrich Nienhaus, Uwe Strähle, Peter Sanders, Ralf Mikut
Abstract	Systematic validation is an essential part of algorithm development. The enormous dataset sizes and the complexity observed in many recent time-resolved 3D fluorescence microscopy imaging experiments, however, prohibit a comprehensive manual ground truth generation. Moreover, existing simulated benchmarks in this field are often too simple or too specialized to sufficiently validate the observed image analysis problems. We present a new semi-synthetic approach to generate realistic 3D+t benchmarks that combines challenging cellular movement dynamics of real embryos with simulated fluorescent nuclei and artificial image distortions including various parametrizable options like cell numbers, acquisition deficiencies or multiview simulations. We successfully applied the approach to simulate the development of a zebrafish embryo with thousands of cells over 14 hours of its early existence.
Tasks
Published	2016-04-17
URL	http://arxiv.org/abs/1604.04906v1
PDF	http://arxiv.org/pdf/1604.04906v1.pdf
PWC	https://paperswithcode.com/paper/generating-semi-synthetic-validation
Repo
Framework

The Singularity Controversy, Part I: Lessons Learned and Open Questions: Conclusions from the Battle on the Legitimacy of the Debate


Title	The Singularity Controversy, Part I: Lessons Learned and Open Questions: Conclusions from the Battle on the Legitimacy of the Debate
Authors	Amnon H. Eden
Abstract	This report seeks to inform policy makers on the nature and the merit of the arguments for and against the concerns associated with a potential technological singularity. Part I describes the lessons learned from our investigation of the subject, separating the argu-ments of merit from the fallacies and misconceptions that confuse the debate and undermine its rational resolution.
Tasks
Published	2016-01-22
URL	http://arxiv.org/abs/1601.05977v2
PDF	http://arxiv.org/pdf/1601.05977v2.pdf
PWC	https://paperswithcode.com/paper/the-singularity-controversy-part-i-lessons
Repo
Framework

A Comparative Analysis of classification data mining techniques : Deriving key factors useful for predicting students performance


Title	A Comparative Analysis of classification data mining techniques : Deriving key factors useful for predicting students performance
Authors	Muhammed Salman Shamsi, Jhansi Lakshmi
Abstract	Students opting for Engineering as their discipline is increasing rapidly. But due to various factors and inappropriate primary education in India, failure rates are high. Students are unable to excel in core engineering because of complex and mathematical subjects. Hence, they fail in such subjects. With the help of data mining techniques, we can predict the performance of students in terms of grades and failure in subjects. This paper performs a comparative analysis of various classification techniques, such as Na"ive Bayes, LibSVM, J48, Random Forest, and JRip and tries to choose best among these. Based on the results obtained, we found that Na"ive Bayes is the most accurate method in terms of students failure prediction and JRip is most accurate in terms of students grade prediction. We also found that JRip marginally differs from Na"ive Bayes in terms of accuracy for students failure prediction and gives us a set of rules from which we derive the key factors influencing students performance. Finally, we suggest various ways to mitigate these factors. This study is limited to Indian Education system scenarios. However, the factors found can be helpful in other scenarios as well.
Tasks
Published	2016-06-18
URL	http://arxiv.org/abs/1606.05735v2
PDF	http://arxiv.org/pdf/1606.05735v2.pdf
PWC	https://paperswithcode.com/paper/a-comparative-analysis-of-classification-data
Repo
Framework

Logic-based Clustering and Learning for Time-Series Data


Title	Logic-based Clustering and Learning for Time-Series Data
Authors	Marcell Vazquez-Chanlatte, Jyotirmoy V. Deshmukh, Xiaoqing Jin, Sanjit A. Seshia
Abstract	To effectively analyze and design cyberphysical systems (CPS), designers today have to combat the data deluge problem, i.e., the burden of processing intractably large amounts of data produced by complex models and experiments. In this work, we utilize monotonic Parametric Signal Temporal Logic (PSTL) to design features for unsupervised classification of time series data. This enables using off-the-shelf machine learning tools to automatically cluster similar traces with respect to a given PSTL formula. We demonstrate how this technique produces interpretable formulas that are amenable to analysis and understanding using a few representative examples. We illustrate this with case studies related to automotive engine testing, highway traffic analysis, and auto-grading massively open online courses.
Tasks	Time Series
Published	2016-12-22
URL	http://arxiv.org/abs/1612.07823v3
PDF	http://arxiv.org/pdf/1612.07823v3.pdf
PWC	https://paperswithcode.com/paper/logic-based-clustering-and-learning-for-time
Repo
Framework


Title	Learning Concept Taxonomies from Multi-modal Data
Authors	Hao Zhang, Zhiting Hu, Yuntian Deng, Mrinmaya Sachan, Zhicheng Yan, Eric P. Xing
Abstract	We study the problem of automatically building hypernym taxonomies from textual and visual data. Previous works in taxonomy induction generally ignore the increasingly prominent visual data, which encode important perceptual semantics. Instead, we propose a probabilistic model for taxonomy induction by jointly leveraging text and images. To avoid hand-crafted feature engineering, we design end-to-end features based on distributed representations of images and words. The model is discriminatively trained given a small set of existing ontologies and is capable of building full taxonomies from scratch for a collection of unseen conceptual label items with associated images. We evaluate our model and features on the WordNet hierarchies, where our system outperforms previous approaches by a large gap.
Tasks	Feature Engineering
Published	2016-06-29
URL	http://arxiv.org/abs/1606.09239v1
PDF	http://arxiv.org/pdf/1606.09239v1.pdf
PWC	https://paperswithcode.com/paper/learning-concept-taxonomies-from-multi-modal
Repo
Framework

Deep Cascaded Bi-Network for Face Hallucination


Title	Deep Cascaded Bi-Network for Face Hallucination
Authors	Shizhan Zhu, Sifei Liu, Chen Change Loy, Xiaoou Tang
Abstract	We present a novel framework for hallucinating faces of unconstrained poses and with very low resolution (face size as small as 5pxIOD). In contrast to existing studies that mostly ignore or assume pre-aligned face spatial configuration (e.g. facial landmarks localization or dense correspondence field), we alternatingly optimize two complementary tasks, namely face hallucination and dense correspondence field estimation, in a unified framework. In addition, we propose a new gated deep bi-network that contains two functionality-specialized branches to recover different levels of texture details. Extensive experiments demonstrate that such formulation allows exceptional hallucination quality on in-the-wild low-res faces with significant pose and illumination variations.
Tasks	Face Hallucination
Published	2016-07-18
URL	http://arxiv.org/abs/1607.05046v1
PDF	http://arxiv.org/pdf/1607.05046v1.pdf
PWC	https://paperswithcode.com/paper/deep-cascaded-bi-network-for-face
Repo
Framework

EmojiNet: Building a Machine Readable Sense Inventory for Emoji


Title	EmojiNet: Building a Machine Readable Sense Inventory for Emoji
Authors	Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran
Abstract	Emoji are a contemporary and extremely popular way to enhance electronic communication. Without rigid semantics attached to them, emoji symbols take on different meanings based on the context of a message. Thus, like the word sense disambiguation task in natural language processing, machines also need to disambiguate the meaning or sense of an emoji. In a first step toward achieving this goal, this paper presents EmojiNet, the first machine readable sense inventory for emoji. EmojiNet is a resource enabling systems to link emoji with their context-specific meaning. It is automatically constructed by integrating multiple emoji resources with BabelNet, which is the most comprehensive multilingual sense inventory available to date. The paper discusses its construction, evaluates the automatic resource creation process, and presents a use case where EmojiNet disambiguates emoji usage in tweets. EmojiNet is available online for use at http://emojinet.knoesis.org.
Tasks	Word Sense Disambiguation
Published	2016-10-25
URL	http://arxiv.org/abs/1610.07710v1
PDF	http://arxiv.org/pdf/1610.07710v1.pdf
PWC	https://paperswithcode.com/paper/emojinet-building-a-machine-readable-sense
Repo
Framework

Simple Does It: Weakly Supervised Instance and Semantic Segmentation


Title	Simple Does It: Weakly Supervised Instance and Semantic Segmentation
Authors	Anna Khoreva, Rodrigo Benenson, Jan Hosang, Matthias Hein, Bernt Schiele
Abstract	Semantic labelling and instance segmentation are two tasks that require particularly costly annotations. Starting from weak supervision in the form of bounding box detection annotations, we propose a new approach that does not require modification of the segmentation training procedure. We show that when carefully designing the input labels from given bounding boxes, even a single round of training is enough to improve over previously reported weakly supervised results. Overall, our weak supervision approach reaches ~95% of the quality of the fully supervised model, both for semantic labelling and instance segmentation.
Tasks	Instance Segmentation, Semantic Segmentation
Published	2016-03-24
URL	http://arxiv.org/abs/1603.07485v2
PDF	http://arxiv.org/pdf/1603.07485v2.pdf
PWC	https://paperswithcode.com/paper/simple-does-it-weakly-supervised-instance-and
Repo
Framework

A Readable Read: Automatic Assessment of Language Learning Materials based on Linguistic Complexity


Title	A Readable Read: Automatic Assessment of Language Learning Materials based on Linguistic Complexity
Authors	Ildikó Pilán, Sowmya Vajjala, Elena Volodina
Abstract	Corpora and web texts can become a rich language learning resource if we have a means of assessing whether they are linguistically appropriate for learners at a given proficiency level. In this paper, we aim at addressing this issue by presenting the first approach for predicting linguistic complexity for Swedish second language learning material on a 5-point scale. After showing that the traditional Swedish readability measure, L"asbarhetsindex (LIX), is not suitable for this task, we propose a supervised machine learning model, based on a range of linguistic features, that can reliably classify texts according to their difficulty level. Our model obtained an accuracy of 81.3% and an F-score of 0.8, which is comparable to the state of the art in English and is considerably higher than previously reported results for other languages. We further studied the utility of our features with single sentences instead of full texts since sentences are a common linguistic unit in language learning exercises. We trained a separate model on sentence-level data with five classes, which yielded 63.4% accuracy. Although this is lower than the document level performance, we achieved an adjacent accuracy of 92%. Furthermore, we found that using a combination of different features, compared to using lexical features alone, resulted in 7% improvement in classification accuracy at the sentence level, whereas at the document level, lexical features were more dominant. Our models are intended for use in a freely accessible web-based language learning platform for the automatic generation of exercises.
Tasks
Published	2016-03-29
URL	http://arxiv.org/abs/1603.08868v1
PDF	http://arxiv.org/pdf/1603.08868v1.pdf
PWC	https://paperswithcode.com/paper/a-readable-read-automatic-assessment-of
Repo
Framework

Discriminating Similar Languages: Evaluations and Explorations


Title	Discriminating Similar Languages: Evaluations and Explorations
Authors	Cyril Goutte, Serge Léger, Shervin Malmasi, Marcos Zampieri
Abstract	We present an analysis of the performance of machine learning classifiers on discriminating between similar languages and language varieties. We carried out a number of experiments using the results of the two editions of the Discriminating between Similar Languages (DSL) shared task. We investigate the progress made between the two tasks, estimate an upper bound on possible performance using ensemble and oracle combination, and provide learning curves to help us understand which languages are more challenging. A number of difficult sentences are identified and investigated further with human annotation.
Tasks
Published	2016-09-30
URL	http://arxiv.org/abs/1610.00031v1
PDF	http://arxiv.org/pdf/1610.00031v1.pdf
PWC	https://paperswithcode.com/paper/discriminating-similar-languages-evaluations
Repo
Framework

On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators


Title	On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators
Authors	Changyou Chen, Nan Ding, Lawrence Carin
Abstract	Recent advances in Bayesian learning with large-scale data have witnessed emergence of stochastic gradient MCMC algorithms (SG-MCMC), such as stochastic gradient Langevin dynamics (SGLD), stochastic gradient Hamiltonian MCMC (SGHMC), and the stochastic gradient thermostat. While finite-time convergence properties of the SGLD with a 1st-order Euler integrator have recently been studied, corresponding theory for general SG-MCMCs has not been explored. In this paper we consider general SG-MCMCs with high-order integrators, and develop theory to analyze finite-time convergence properties and their asymptotic invariant measures. Our theoretical results show faster convergence rates and more accurate invariant measures for SG-MCMCs with higher-order integrators. For example, with the proposed efficient 2nd-order symmetric splitting integrator, the {\em mean square error} (MSE) of the posterior average for the SGHMC achieves an optimal convergence rate of $L^{-4/5}$ at $L$ iterations, compared to $L^{-2/3}$ for the SGHMC and SGLD with 1st-order Euler integrators. Furthermore, convergence results of decreasing-step-size SG-MCMCs are also developed, with the same convergence rates as their fixed-step-size counterparts for a specific decreasing sequence. Experiments on both synthetic and real datasets verify our theory, and show advantages of the proposed method in two large-scale real applications.
Tasks
Published	2016-10-21
URL	http://arxiv.org/abs/1610.06665v1
PDF	http://arxiv.org/pdf/1610.06665v1.pdf
PWC	https://paperswithcode.com/paper/on-the-convergence-of-stochastic-gradient
Repo
Framework

Revisiting Human Action Recognition: Personalization vs. Generalization


Title	Revisiting Human Action Recognition: Personalization vs. Generalization
Authors	Andrea Zunino, Jacopo Cavazza, Vittorio Murino
Abstract	By thoroughly revisiting the classic human action recognition paradigm, this paper aims at proposing a new approach for the design of effective action classification systems. Taking as testbed publicly available three-dimensional (MoCap) action/activity datasets, we analyzed and validated different training/testing strategies. In particular, considering that each human action in the datasets is performed several times by different subjects, we were able to precisely quantify the effect of inter- and intra-subject variability, so as to figure out the impact of several learning approaches in terms of classification performance. The net result is that standard testing strategies consisting in cross-validating the algorithm using typical splits of the data (holdout, k-fold, or one-subject-out) is always outperformed by a “personalization” strategy which learns how a subject is performing an action. In other words, it is advantageous to customize (i.e., personalize) the method to learn the actions carried out by each subject, rather than trying to generalize the actions executions across subjects. Consequently, we finally propose an action recognition framework consisting of a two-stage classification approach where, given a test action, the subject is first identified before the actual recognition of the action takes place. Despite the basic, off-the-shelf descriptors and standard classifiers adopted, we noted a relevant increase in performance with respect to standard state-of-the-art algorithms, so motivating the usage of personalized approaches for designing effective action recognition systems.
Tasks	Action Classification, Temporal Action Localization
Published	2016-05-02
URL	http://arxiv.org/abs/1605.00392v1
PDF	http://arxiv.org/pdf/1605.00392v1.pdf
PWC	https://paperswithcode.com/paper/revisiting-human-action-recognition
Repo
Framework

CITlab ARGUS for historical handwritten documents


Title	CITlab ARGUS for historical handwritten documents
Authors	Gundram Leifert, Tobias Strauß, Tobias Grüning, Roger Labahn
Abstract	We describe CITlab’s recognition system for the HTRtS competition attached to the 13. International Conference on Document Analysis and Recognition, ICDAR 2015. The task comprises the recognition of historical handwritten documents. The core algorithms of our system are based on multi-dimensional recurrent neural networks (MDRNN) and connectionist temporal classification (CTC). The software modules behind that as well as the basic utility technologies are essentially powered by PLANET’s ARGUS framework for intelligent text recognition and image processing.
Tasks
Published	2016-05-26
URL	http://arxiv.org/abs/1605.08412v1
PDF	http://arxiv.org/pdf/1605.08412v1.pdf
PWC	https://paperswithcode.com/paper/citlab-argus-for-historical-handwritten-1
Repo
Framework