May 5, 2019

2154 words 11 mins read

Paper Group NANR 80

Annotating Spelling Errors in German Texts Produced by Primary School Children. Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences. High-dimensional omics data analysis using a variable screening protocol with prior knowledge integration (SKI). MayoNLP at SemEval-2016 Task 1: Semantic Textual Similarity based on …

Annotating Spelling Errors in German Texts Produced by Primary School Children


Title	Annotating Spelling Errors in German Texts Produced by Primary School Children
Authors	Ronja Laarmann-Quante, Lukas Knichel, Stefanie Dipper, Carina Betken
Abstract
Tasks
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-1705/
PDF	https://www.aclweb.org/anthology/W16-1705
PWC	https://paperswithcode.com/paper/annotating-spelling-errors-in-german-texts
Repo
Framework

Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences


Title	Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences
Authors	Hongseok Namkoong, John C. Duchi
Abstract	We develop efficient solution methods for a robust empirical risk minimization problem designed to give calibrated confidence intervals on performance and provide optimal tradeoffs between bias and variance. Our methods apply to distributionally robust optimization problems proposed by Ben-Tal et al., which put more weight on observations inducing high loss via a worst-case approach over a non-parametric uncertainty set on the underlying data distribution. Our algorithm solves the resulting minimax problems with nearly the same computational cost of stochastic gradient descent through the use of several carefully designed data structures. For a sample of size n, the per-iteration cost of our method scales as O(log n), which allows us to give optimality certificates that distributionally robust optimization provides at little extra cost compared to empirical risk minimization and stochastic gradient methods.
Tasks
Published	2016-12-01
URL	http://papers.nips.cc/paper/6040-stochastic-gradient-methods-for-distributionally-robust-optimization-with-f-divergences
PDF	http://papers.nips.cc/paper/6040-stochastic-gradient-methods-for-distributionally-robust-optimization-with-f-divergences.pdf
PWC	https://paperswithcode.com/paper/stochastic-gradient-methods-for
Repo
Framework

High-dimensional omics data analysis using a variable screening protocol with prior knowledge integration (SKI)


Title	High-dimensional omics data analysis using a variable screening protocol with prior knowledge integration (SKI)
Authors	Cong Liu, Jianping Jiang, Jianlei Gu, Zhangsheng Yu, Tao Wang & Hui Lu
Abstract	Background High-throughput technology could generate thousands to millions biomarker measurements in one experiment. However, results from high throughput analysis are often barely reproducible due to small sample size. Different statistical methods have been proposed to tackle this “small n and large p” scenario, for example different datasets could be pooled or integrated together to provide an effective way to improve reproducibility. However, the raw data is either unavailable or hard to integrate due to different experimental conditions, thus there is an emerging need to develop a method for “knowledge integration” in high-throughput data analysis. Results In this study, we proposed an integrative prescreening approach, SKI, for high-throughput data analysis. A new rank is generated based on two initial ranks: (1) knowledge based rank; and (2) marginal correlation based rank. Our simulation shows the SKI outperforms other methods without knowledge-integration in terms of higher true positive rate given the same number of variables selected. We also applied our method in a drug response study and found its performance to be better than regular screening methods. Conclusion The proposed method provides an effective way to integrate knowledge for high-throughput analysis. It could easily implemented with our provided R package named SKI.
Tasks
Published	2016-12-23
URL	https://bmcsystbiol.biomedcentral.com/articles/10.1186/s12918-016-0358-0
PDF	https://bmcsystbiol.biomedcentral.com/articles/10.1186/s12918-016-0358-0
PWC	https://paperswithcode.com/paper/high-dimensional-omics-data-analysis-using-a
Repo
Framework

MayoNLP at SemEval-2016 Task 1: Semantic Textual Similarity based on Lexical Semantic Net and Deep Learning Semantic Model


Title	MayoNLP at SemEval-2016 Task 1: Semantic Textual Similarity based on Lexical Semantic Net and Deep Learning Semantic Model
Authors	Naveed Afzal, Yanshan Wang, Hongfang Liu
Abstract
Tasks	Information Retrieval, Machine Translation, Semantic Textual Similarity
Published	2016-06-01
URL	https://www.aclweb.org/anthology/S16-1103/
PDF	https://www.aclweb.org/anthology/S16-1103
PWC	https://paperswithcode.com/paper/mayonlp-at-semeval-2016-task-1-semantic
Repo
Framework

Crowdsourced Clustering: Querying Edges vs Triangles


Title	Crowdsourced Clustering: Querying Edges vs Triangles
Authors	Ramya Korlakai Vinayak, Babak Hassibi
Abstract	We consider the task of clustering items using answers from non-expert crowd workers. In such cases, the workers are often not able to label the items directly, however, it is reasonable to assume that they can compare items and judge whether they are similar or not. An important question is what queries to make, and we compare two types: random edge queries, where a pair of items is revealed, and random triangles, where a triple is. Since it is far too expensive to query all possible edges and/or triangles, we need to work with partial observations subject to a fixed query budget constraint. When a generative model for the data is available (and we consider a few of these) we determine the cost of a query by its entropy; when such models do not exist we use the average response time per query of the workers as a surrogate for the cost. In addition to theoretical justification, through several simulations and experiments on two real data sets on Amazon Mechanical Turk, we empirically demonstrate that, for a fixed budget, triangle queries uniformly outperform edge queries. Even though, in contrast to edge queries, triangle queries reveal dependent edges, they provide more reliable edges and, for a fixed budget, many more of them. We also provide a sufficient condition on the number of observations, edge densities inside and outside the clusters and the minimum cluster size required for the exact recovery of the true adjacency matrix via triangle queries using a convex optimization-based clustering algorithm.
Tasks
Published	2016-12-01
URL	http://papers.nips.cc/paper/6499-crowdsourced-clustering-querying-edges-vs-triangles
PDF	http://papers.nips.cc/paper/6499-crowdsourced-clustering-querying-edges-vs-triangles.pdf
PWC	https://paperswithcode.com/paper/crowdsourced-clustering-querying-edges-vs
Repo
Framework

Earthquake magnitude prediction in Hindukush region using machine learning techniques


Title	Earthquake magnitude prediction in Hindukush region using machine learning techniques
Authors	Khawaja Asim, Francisco Martínez-Álvarez, Abdul Basit, Talat Iqbal
Abstract	Earthquake magnitude prediction for Hindukush region has been carried out in this research using the temporal sequence of historic seismic activities in combination with the machine learning classifiers. Prediction has been made on the basis of mathematically calculated eight seismic indicators using the earthquake catalog of the region. These parameters are based on the well-known geophysical facts of Gutenberg–Richter’s inverse law, distribution of characteristic earthquake magnitudes and seismic quiescence. In this research, four machine learning techniques including pattern recognition neural network, recurrent neural network, random forest and linear programming boost ensemble classifier are separately applied to model relationships between calculated seismic parameters and future earthquake occurrences. The problem is formulated as a binary classification task and predictions are made for earthquakes of magnitude greater than or equal to 5.5 ((M \ge) 5.5), for the duration of 1 month. Furthermore, the analysis of earthquake prediction results is carried out for every machine learning classifier in terms of sensitivity, specificity, true and false predictive values. Accuracy is another performance measure considered for analyzing the results. Earthquake magnitude prediction for the Hindukush using these aforementioned techniques show significant and encouraging results, thus constituting a step forward toward the final robust prediction mechanism which is not available so far.
Tasks
Published	2016-09-08
URL	https://www.researchgate.net/publication/307951466_Earthquake_magnitude_prediction_in_Hindukush_region_using_machine_learning_techniques
PDF	https://www.researchgate.net/publication/307951466_Earthquake_magnitude_prediction_in_Hindukush_region_using_machine_learning_techniques
PWC	https://paperswithcode.com/paper/earthquake-magnitude-prediction-in-hindukush
Repo
Framework

Palabras: Crowdsourcing Transcriptions of L2 Speech


Title	Palabras: Crowdsourcing Transcriptions of L2 Speech
Authors	S, Eric ers, Pepi Burgos, Catia Cucchiarini, Roel van Hout,
Abstract	We developed a web application for crowdsourcing transcriptions of Dutch words spoken by Spanish L2 learners. In this paper we discuss the design of the application and the influence of metadata and various forms of feedback. Useful data were obtained from 159 participants, with an average of over 20 transcriptions per item, which seems a satisfactory result for this type of research. Informing participants about how many items they still had to complete, and not how many they had already completed, turned to be an incentive to do more items. Assigning participants a score for their performance made it more attractive for them to carry out the transcription task, but this seemed to influence their performance. We discuss possible advantages and disadvantages in connection with the aim of the research and consider possible lessons for designing future experiments.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1508/
PDF	https://www.aclweb.org/anthology/L16-1508
PWC	https://paperswithcode.com/paper/palabras-crowdsourcing-transcriptions-of-l2
Repo
Framework

Solving Verbal Questions in IQ Test by Knowledge-Powered Word Embedding


Title	Solving Verbal Questions in IQ Test by Knowledge-Powered Word Embedding
Authors	Huazheng Wang, Fei Tian, Bin Gao, Chengjieren Zhu, Jiang Bian, Tie-Yan Liu
Abstract
Tasks	Face Recognition, Question Answering, Speech Recognition
Published	2016-11-01
URL	https://www.aclweb.org/anthology/D16-1052/
PDF	https://www.aclweb.org/anthology/D16-1052
PWC	https://paperswithcode.com/paper/solving-verbal-questions-in-iq-test-by
Repo
Framework

Emergent: a novel data-set for stance classification


Title	Emergent: a novel data-set for stance classification
Authors	William Ferreira, Andreas Vlachos
Abstract
Tasks	Natural Language Inference, Reading Comprehension, Rumour Detection
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-1138/
PDF	https://www.aclweb.org/anthology/N16-1138
PWC	https://paperswithcode.com/paper/emergent-a-novel-data-set-for-stance
Repo
Framework

Source Language Adaptation Approaches for Resource-Poor Machine Translation


Title	Source Language Adaptation Approaches for Resource-Poor Machine Translation
Authors	Pidong Wang, Preslav Nakov, Hwee Tou Ng
Abstract
Tasks	Machine Translation
Published	2016-06-01
URL	https://www.aclweb.org/anthology/J16-2004/
PDF	https://www.aclweb.org/anthology/J16-2004
PWC	https://paperswithcode.com/paper/source-language-adaptation-approaches-for
Repo
Framework

Lifetime Achievement Award: Linguistics: The Garden and the Bush


Title	Lifetime Achievement Award: Linguistics: The Garden and the Bush
Authors	Joan Bresnan
Abstract
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/J16-4001/
PDF	https://www.aclweb.org/anthology/J16-4001
PWC	https://paperswithcode.com/paper/lifetime-achievement-award-linguistics-the
Repo
Framework

An in-network data cleaning approach for wireless sensor networks


Title	An in-network data cleaning approach for wireless sensor networks
Authors	Jianjun Leia, Haiyang Bia, Ying Xiaa, Jun Huanga and Haeyoung Baeb
Abstract	Wireless Sensor Networks (WSNs) are widely used for monitoring physical happenings of the environment. However, the data gathered by the WSNs may be inaccurate and unreliable due to power exhaustion, noise and other reasons. Unnecessary data such as erroneous data and redundant data transmission causes a lot of extra energy consumption. To improve the data reliability and reduce the energy consumption, we proposed an in-network processing architecture for data cleaning, which divides the task into four stages implemented in different nodes respectively. This strategy guaranteed the cleaning algorithms were computationally lightweight in local nodes and energy-efficient due to almost no communication overhead. In addition, we presented the detection algorithms for data faults and event outliers, which were conducted by utilizing the related attributes from the local sensor node and the cooperation with its relaying neighbor. Experiment results show that our proposed approach is accurate and energy-efficient.
Tasks	Clustering Algorithms Evaluation
Published	2016-03-17
URL	http://www.tandfonline.com/loi/tasj20
PDF	http://www.tandfonline.com/loi/tasj20
PWC	https://paperswithcode.com/paper/an-in-network-data-cleaning-approach-for
Repo
Framework

Upper Bound of Entropy Rate Revisited —A New Extrapolation of Compressed Large-Scale Corpora—


Title	Upper Bound of Entropy Rate Revisited —A New Extrapolation of Compressed Large-Scale Corpora—
Authors	Ryosuke Takahira, Kumiko Tanaka-Ishii, {\L}ukasz D{\k{e}}bowski
Abstract	The article presents results of entropy rate estimation for human languages across six languages by using large, state-of-the-art corpora of up to 7.8 gigabytes. To obtain the estimates for data length tending to infinity, we use an extrapolation function given by an ansatz. Whereas some ansatzes of this kind were proposed in previous research papers, here we introduce a stretched exponential extrapolation function that has a smaller error of fit. In this way, we uncover a possibility that the entropy rates of human languages are positive but 20{%} smaller than previously reported.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4124/
PDF	https://www.aclweb.org/anthology/W16-4124
PWC	https://paperswithcode.com/paper/upper-bound-of-entropy-rate-revisited-a-a-new
Repo
Framework


Title	The Social Mood of News: Self-reported Annotations to Design Automatic Mood Detection Systems
Authors	Firoj Alam, Fabio Celli, Evgeny A. Stepanov, Arindam Ghosh, Giuseppe Riccardi
Abstract	In this paper, we address the issue of automatic prediction of readers{'} mood from newspaper articles and comments. As online newspapers are becoming more and more similar to social media platforms, users can provide affective feedback, such as mood and emotion. We have exploited the self-reported annotation of mood categories obtained from the metadata of the Italian online newspaper corriere.it to design and evaluate a system for predicting five different mood categories from news articles and comments: indignation, disappointment, worry, satisfaction, and amusement. The outcome of our experiments shows that overall, bag-of-word-ngrams perform better compared to all other feature sets; however, stylometric features perform better for the mood score prediction of articles. Our study shows that self-reported annotations can be used to design automatic mood prediction systems.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4316/
PDF	https://www.aclweb.org/anthology/W16-4316
PWC	https://paperswithcode.com/paper/the-social-mood-of-news-self-reported
Repo
Framework

Improved Error Bounds for Tree Representations of Metric Spaces


Title	Improved Error Bounds for Tree Representations of Metric Spaces
Authors	Samir Chowdhury, Facundo Mémoli, Zane T. Smith
Abstract	Estimating optimal phylogenetic trees or hierarchical clustering trees from metric data is an important problem in evolutionary biology and data analysis. Intuitively, the goodness-of-fit of a metric space to a tree depends on its inherent treeness, as well as other metric properties such as intrinsic dimension. Existing algorithms for embedding metric spaces into tree metrics provide distortion bounds depending on cardinality. Because cardinality is a simple property of any set, we argue that such bounds do not fully capture the rich structure endowed by the metric. We consider an embedding of a metric space into a tree proposed by Gromov. By proving a stability result, we obtain an improved additive distortion bound depending only on the hyperbolicity and doubling dimension of the metric. We observe that Gromov’s method is dual to the well-known single linkage hierarchical clustering (SLHC) method. By means of this duality, we are able to transport our results to the setting of SLHC, where such additive distortion bounds were previously unknown.
Tasks
Published	2016-12-01
URL	http://papers.nips.cc/paper/6431-improved-error-bounds-for-tree-representations-of-metric-spaces
PDF	http://papers.nips.cc/paper/6431-improved-error-bounds-for-tree-representations-of-metric-spaces.pdf
PWC	https://paperswithcode.com/paper/improved-error-bounds-for-tree
Repo
Framework