May 5, 2019

2154 words 11 mins read

Paper Group NANR 80

Paper Group NANR 80

Annotating Spelling Errors in German Texts Produced by Primary School Children. Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences. High-dimensional omics data analysis using a variable screening protocol with prior knowledge integration (SKI). MayoNLP at SemEval-2016 Task 1: Semantic Textual Similarity based on …

Annotating Spelling Errors in German Texts Produced by Primary School Children

Title Annotating Spelling Errors in German Texts Produced by Primary School Children
Authors Ronja Laarmann-Quante, Lukas Knichel, Stefanie Dipper, Carina Betken
Abstract
Tasks
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-1705/
PDF https://www.aclweb.org/anthology/W16-1705
PWC https://paperswithcode.com/paper/annotating-spelling-errors-in-german-texts
Repo
Framework

Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences

Title Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences
Authors Hongseok Namkoong, John C. Duchi
Abstract We develop efficient solution methods for a robust empirical risk minimization problem designed to give calibrated confidence intervals on performance and provide optimal tradeoffs between bias and variance. Our methods apply to distributionally robust optimization problems proposed by Ben-Tal et al., which put more weight on observations inducing high loss via a worst-case approach over a non-parametric uncertainty set on the underlying data distribution. Our algorithm solves the resulting minimax problems with nearly the same computational cost of stochastic gradient descent through the use of several carefully designed data structures. For a sample of size n, the per-iteration cost of our method scales as O(log n), which allows us to give optimality certificates that distributionally robust optimization provides at little extra cost compared to empirical risk minimization and stochastic gradient methods.
Tasks
Published 2016-12-01
URL http://papers.nips.cc/paper/6040-stochastic-gradient-methods-for-distributionally-robust-optimization-with-f-divergences
PDF http://papers.nips.cc/paper/6040-stochastic-gradient-methods-for-distributionally-robust-optimization-with-f-divergences.pdf
PWC https://paperswithcode.com/paper/stochastic-gradient-methods-for
Repo
Framework

High-dimensional omics data analysis using a variable screening protocol with prior knowledge integration (SKI)

Title High-dimensional omics data analysis using a variable screening protocol with prior knowledge integration (SKI)
Authors Cong Liu, Jianping Jiang, Jianlei Gu, Zhangsheng Yu, Tao Wang & Hui Lu
Abstract Background High-throughput technology could generate thousands to millions biomarker measurements in one experiment. However, results from high throughput analysis are often barely reproducible due to small sample size. Different statistical methods have been proposed to tackle this “small n and large p” scenario, for example different datasets could be pooled or integrated together to provide an effective way to improve reproducibility. However, the raw data is either unavailable or hard to integrate due to different experimental conditions, thus there is an emerging need to develop a method for “knowledge integration” in high-throughput data analysis. Results In this study, we proposed an integrative prescreening approach, SKI, for high-throughput data analysis. A new rank is generated based on two initial ranks: (1) knowledge based rank; and (2) marginal correlation based rank. Our simulation shows the SKI outperforms other methods without knowledge-integration in terms of higher true positive rate given the same number of variables selected. We also applied our method in a drug response study and found its performance to be better than regular screening methods. Conclusion The proposed method provides an effective way to integrate knowledge for high-throughput analysis. It could easily implemented with our provided R package named SKI.
Tasks
Published 2016-12-23
URL https://bmcsystbiol.biomedcentral.com/articles/10.1186/s12918-016-0358-0
PDF https://bmcsystbiol.biomedcentral.com/articles/10.1186/s12918-016-0358-0
PWC https://paperswithcode.com/paper/high-dimensional-omics-data-analysis-using-a
Repo
Framework

MayoNLP at SemEval-2016 Task 1: Semantic Textual Similarity based on Lexical Semantic Net and Deep Learning Semantic Model

Title MayoNLP at SemEval-2016 Task 1: Semantic Textual Similarity based on Lexical Semantic Net and Deep Learning Semantic Model
Authors Naveed Afzal, Yanshan Wang, Hongfang Liu
Abstract
Tasks Information Retrieval, Machine Translation, Semantic Textual Similarity
Published 2016-06-01
URL https://www.aclweb.org/anthology/S16-1103/
PDF https://www.aclweb.org/anthology/S16-1103
PWC https://paperswithcode.com/paper/mayonlp-at-semeval-2016-task-1-semantic
Repo
Framework

Crowdsourced Clustering: Querying Edges vs Triangles

Title Crowdsourced Clustering: Querying Edges vs Triangles
Authors Ramya Korlakai Vinayak, Babak Hassibi
Abstract We consider the task of clustering items using answers from non-expert crowd workers. In such cases, the workers are often not able to label the items directly, however, it is reasonable to assume that they can compare items and judge whether they are similar or not. An important question is what queries to make, and we compare two types: random edge queries, where a pair of items is revealed, and random triangles, where a triple is. Since it is far too expensive to query all possible edges and/or triangles, we need to work with partial observations subject to a fixed query budget constraint. When a generative model for the data is available (and we consider a few of these) we determine the cost of a query by its entropy; when such models do not exist we use the average response time per query of the workers as a surrogate for the cost. In addition to theoretical justification, through several simulations and experiments on two real data sets on Amazon Mechanical Turk, we empirically demonstrate that, for a fixed budget, triangle queries uniformly outperform edge queries. Even though, in contrast to edge queries, triangle queries reveal dependent edges, they provide more reliable edges and, for a fixed budget, many more of them. We also provide a sufficient condition on the number of observations, edge densities inside and outside the clusters and the minimum cluster size required for the exact recovery of the true adjacency matrix via triangle queries using a convex optimization-based clustering algorithm.
Tasks
Published 2016-12-01
URL http://papers.nips.cc/paper/6499-crowdsourced-clustering-querying-edges-vs-triangles
PDF http://papers.nips.cc/paper/6499-crowdsourced-clustering-querying-edges-vs-triangles.pdf
PWC https://paperswithcode.com/paper/crowdsourced-clustering-querying-edges-vs
Repo
Framework

Earthquake magnitude prediction in Hindukush region using machine learning techniques

Title Earthquake magnitude prediction in Hindukush region using machine learning techniques
Authors Khawaja Asim, Francisco Martínez-Álvarez, Abdul Basit, Talat Iqbal
Abstract Earthquake magnitude prediction for Hindukush region has been carried out in this research using the temporal sequence of historic seismic activities in combination with the machine learning classifiers. Prediction has been made on the basis of mathematically calculated eight seismic indicators using the earthquake catalog of the region. These parameters are based on the well-known geophysical facts of Gutenberg–Richter’s inverse law, distribution of characteristic earthquake magnitudes and seismic quiescence. In this research, four machine learning techniques including pattern recognition neural network, recurrent neural network, random forest and linear programming boost ensemble classifier are separately applied to model relationships between calculated seismic parameters and future earthquake occurrences. The problem is formulated as a binary classification task and predictions are made for earthquakes of magnitude greater than or equal to 5.5 ((M \ge) 5.5), for the duration of 1 month. Furthermore, the analysis of earthquake prediction results is carried out for every machine learning classifier in terms of sensitivity, specificity, true and false predictive values. Accuracy is another performance measure considered for analyzing the results. Earthquake magnitude prediction for the Hindukush using these aforementioned techniques show significant and encouraging results, thus constituting a step forward toward the final robust prediction mechanism which is not available so far.
Tasks
Published 2016-09-08
URL https://www.researchgate.net/publication/307951466_Earthquake_magnitude_prediction_in_Hindukush_region_using_machine_learning_techniques
PDF https://www.researchgate.net/publication/307951466_Earthquake_magnitude_prediction_in_Hindukush_region_using_machine_learning_techniques
PWC https://paperswithcode.com/paper/earthquake-magnitude-prediction-in-hindukush
Repo
Framework

Palabras: Crowdsourcing Transcriptions of L2 Speech

Title Palabras: Crowdsourcing Transcriptions of L2 Speech
Authors S, Eric ers, Pepi Burgos, Catia Cucchiarini, Roel van Hout,
Abstract We developed a web application for crowdsourcing transcriptions of Dutch words spoken by Spanish L2 learners. In this paper we discuss the design of the application and the influence of metadata and various forms of feedback. Useful data were obtained from 159 participants, with an average of over 20 transcriptions per item, which seems a satisfactory result for this type of research. Informing participants about how many items they still had to complete, and not how many they had already completed, turned to be an incentive to do more items. Assigning participants a score for their performance made it more attractive for them to carry out the transcription task, but this seemed to influence their performance. We discuss possible advantages and disadvantages in connection with the aim of the research and consider possible lessons for designing future experiments.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1508/
PDF https://www.aclweb.org/anthology/L16-1508
PWC https://paperswithcode.com/paper/palabras-crowdsourcing-transcriptions-of-l2
Repo
Framework

Solving Verbal Questions in IQ Test by Knowledge-Powered Word Embedding

Title Solving Verbal Questions in IQ Test by Knowledge-Powered Word Embedding
Authors Huazheng Wang, Fei Tian, Bin Gao, Chengjieren Zhu, Jiang Bian, Tie-Yan Liu
Abstract
Tasks Face Recognition, Question Answering, Speech Recognition
Published 2016-11-01
URL https://www.aclweb.org/anthology/D16-1052/
PDF https://www.aclweb.org/anthology/D16-1052
PWC https://paperswithcode.com/paper/solving-verbal-questions-in-iq-test-by
Repo
Framework

Emergent: a novel data-set for stance classification

Title Emergent: a novel data-set for stance classification
Authors William Ferreira, Andreas Vlachos
Abstract
Tasks Natural Language Inference, Reading Comprehension, Rumour Detection
Published 2016-06-01
URL https://www.aclweb.org/anthology/N16-1138/
PDF https://www.aclweb.org/anthology/N16-1138
PWC https://paperswithcode.com/paper/emergent-a-novel-data-set-for-stance
Repo
Framework

Source Language Adaptation Approaches for Resource-Poor Machine Translation

Title Source Language Adaptation Approaches for Resource-Poor Machine Translation
Authors Pidong Wang, Preslav Nakov, Hwee Tou Ng
Abstract
Tasks Machine Translation
Published 2016-06-01
URL https://www.aclweb.org/anthology/J16-2004/
PDF https://www.aclweb.org/anthology/J16-2004
PWC https://paperswithcode.com/paper/source-language-adaptation-approaches-for
Repo
Framework

Lifetime Achievement Award: Linguistics: The Garden and the Bush

Title Lifetime Achievement Award: Linguistics: The Garden and the Bush
Authors Joan Bresnan
Abstract
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/J16-4001/
PDF https://www.aclweb.org/anthology/J16-4001
PWC https://paperswithcode.com/paper/lifetime-achievement-award-linguistics-the
Repo
Framework

An in-network data cleaning approach for wireless sensor networks

Title An in-network data cleaning approach for wireless sensor networks
Authors Jianjun Leia, Haiyang Bia, Ying Xiaa, Jun Huanga and Haeyoung Baeb
Abstract Wireless Sensor Networks (WSNs) are widely used for monitoring physical happenings of the environment. However, the data gathered by the WSNs may be inaccurate and unreliable due to power exhaustion, noise and other reasons. Unnecessary data such as erroneous data and redundant data transmission causes a lot of extra energy consumption. To improve the data reliability and reduce the energy consumption, we proposed an in-network processing architecture for data cleaning, which divides the task into four stages implemented in different nodes respectively. This strategy guaranteed the cleaning algorithms were computationally lightweight in local nodes and energy-efficient due to almost no communication overhead. In addition, we presented the detection algorithms for data faults and event outliers, which were conducted by utilizing the related attributes from the local sensor node and the cooperation with its relaying neighbor. Experiment results show that our proposed approach is accurate and energy-efficient.
Tasks Clustering Algorithms Evaluation
Published 2016-03-17
URL http://www.tandfonline.com/loi/tasj20
PDF http://www.tandfonline.com/loi/tasj20
PWC https://paperswithcode.com/paper/an-in-network-data-cleaning-approach-for
Repo
Framework

Upper Bound of Entropy Rate Revisited —A New Extrapolation of Compressed Large-Scale Corpora—

Title Upper Bound of Entropy Rate Revisited —A New Extrapolation of Compressed Large-Scale Corpora—
Authors Ryosuke Takahira, Kumiko Tanaka-Ishii, {\L}ukasz D{\k{e}}bowski
Abstract The article presents results of entropy rate estimation for human languages across six languages by using large, state-of-the-art corpora of up to 7.8 gigabytes. To obtain the estimates for data length tending to infinity, we use an extrapolation function given by an ansatz. Whereas some ansatzes of this kind were proposed in previous research papers, here we introduce a stretched exponential extrapolation function that has a smaller error of fit. In this way, we uncover a possibility that the entropy rates of human languages are positive but 20{%} smaller than previously reported.
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4124/
PDF https://www.aclweb.org/anthology/W16-4124
PWC https://paperswithcode.com/paper/upper-bound-of-entropy-rate-revisited-a-a-new
Repo
Framework

The Social Mood of News: Self-reported Annotations to Design Automatic Mood Detection Systems

Title The Social Mood of News: Self-reported Annotations to Design Automatic Mood Detection Systems
Authors Firoj Alam, Fabio Celli, Evgeny A. Stepanov, Arindam Ghosh, Giuseppe Riccardi
Abstract In this paper, we address the issue of automatic prediction of readers{'} mood from newspaper articles and comments. As online newspapers are becoming more and more similar to social media platforms, users can provide affective feedback, such as mood and emotion. We have exploited the self-reported annotation of mood categories obtained from the metadata of the Italian online newspaper corriere.it to design and evaluate a system for predicting five different mood categories from news articles and comments: indignation, disappointment, worry, satisfaction, and amusement. The outcome of our experiments shows that overall, bag-of-word-ngrams perform better compared to all other feature sets; however, stylometric features perform better for the mood score prediction of articles. Our study shows that self-reported annotations can be used to design automatic mood prediction systems.
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4316/
PDF https://www.aclweb.org/anthology/W16-4316
PWC https://paperswithcode.com/paper/the-social-mood-of-news-self-reported
Repo
Framework

Improved Error Bounds for Tree Representations of Metric Spaces

Title Improved Error Bounds for Tree Representations of Metric Spaces
Authors Samir Chowdhury, Facundo Mémoli, Zane T. Smith
Abstract Estimating optimal phylogenetic trees or hierarchical clustering trees from metric data is an important problem in evolutionary biology and data analysis. Intuitively, the goodness-of-fit of a metric space to a tree depends on its inherent treeness, as well as other metric properties such as intrinsic dimension. Existing algorithms for embedding metric spaces into tree metrics provide distortion bounds depending on cardinality. Because cardinality is a simple property of any set, we argue that such bounds do not fully capture the rich structure endowed by the metric. We consider an embedding of a metric space into a tree proposed by Gromov. By proving a stability result, we obtain an improved additive distortion bound depending only on the hyperbolicity and doubling dimension of the metric. We observe that Gromov’s method is dual to the well-known single linkage hierarchical clustering (SLHC) method. By means of this duality, we are able to transport our results to the setting of SLHC, where such additive distortion bounds were previously unknown.
Tasks
Published 2016-12-01
URL http://papers.nips.cc/paper/6431-improved-error-bounds-for-tree-representations-of-metric-spaces
PDF http://papers.nips.cc/paper/6431-improved-error-bounds-for-tree-representations-of-metric-spaces.pdf
PWC https://paperswithcode.com/paper/improved-error-bounds-for-tree
Repo
Framework
comments powered by Disqus