Paper Group NANR 80
Annotating Spelling Errors in German Texts Produced by Primary School Children. Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences. High-dimensional omics data analysis using a variable screening protocol with prior knowledge integration (SKI). MayoNLP at SemEval-2016 Task 1: Semantic Textual Similarity based on …
Annotating Spelling Errors in German Texts Produced by Primary School Children
Title | Annotating Spelling Errors in German Texts Produced by Primary School Children |
Authors | Ronja Laarmann-Quante, Lukas Knichel, Stefanie Dipper, Carina Betken |
Abstract | |
Tasks | |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-1705/ |
https://www.aclweb.org/anthology/W16-1705 | |
PWC | https://paperswithcode.com/paper/annotating-spelling-errors-in-german-texts |
Repo | |
Framework | |
Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences
Title | Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences |
Authors | Hongseok Namkoong, John C. Duchi |
Abstract | We develop efficient solution methods for a robust empirical risk minimization problem designed to give calibrated confidence intervals on performance and provide optimal tradeoffs between bias and variance. Our methods apply to distributionally robust optimization problems proposed by Ben-Tal et al., which put more weight on observations inducing high loss via a worst-case approach over a non-parametric uncertainty set on the underlying data distribution. Our algorithm solves the resulting minimax problems with nearly the same computational cost of stochastic gradient descent through the use of several carefully designed data structures. For a sample of size n, the per-iteration cost of our method scales as O(log n), which allows us to give optimality certificates that distributionally robust optimization provides at little extra cost compared to empirical risk minimization and stochastic gradient methods. |
Tasks | |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6040-stochastic-gradient-methods-for-distributionally-robust-optimization-with-f-divergences |
http://papers.nips.cc/paper/6040-stochastic-gradient-methods-for-distributionally-robust-optimization-with-f-divergences.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-gradient-methods-for |
Repo | |
Framework | |
High-dimensional omics data analysis using a variable screening protocol with prior knowledge integration (SKI)
Title | High-dimensional omics data analysis using a variable screening protocol with prior knowledge integration (SKI) |
Authors | Cong Liu, Jianping Jiang, Jianlei Gu, Zhangsheng Yu, Tao Wang & Hui Lu |
Abstract | Background High-throughput technology could generate thousands to millions biomarker measurements in one experiment. However, results from high throughput analysis are often barely reproducible due to small sample size. Different statistical methods have been proposed to tackle this “small n and large p” scenario, for example different datasets could be pooled or integrated together to provide an effective way to improve reproducibility. However, the raw data is either unavailable or hard to integrate due to different experimental conditions, thus there is an emerging need to develop a method for “knowledge integration” in high-throughput data analysis. Results In this study, we proposed an integrative prescreening approach, SKI, for high-throughput data analysis. A new rank is generated based on two initial ranks: (1) knowledge based rank; and (2) marginal correlation based rank. Our simulation shows the SKI outperforms other methods without knowledge-integration in terms of higher true positive rate given the same number of variables selected. We also applied our method in a drug response study and found its performance to be better than regular screening methods. Conclusion The proposed method provides an effective way to integrate knowledge for high-throughput analysis. It could easily implemented with our provided R package named SKI. |
Tasks | |
Published | 2016-12-23 |
URL | https://bmcsystbiol.biomedcentral.com/articles/10.1186/s12918-016-0358-0 |
https://bmcsystbiol.biomedcentral.com/articles/10.1186/s12918-016-0358-0 | |
PWC | https://paperswithcode.com/paper/high-dimensional-omics-data-analysis-using-a |
Repo | |
Framework | |
MayoNLP at SemEval-2016 Task 1: Semantic Textual Similarity based on Lexical Semantic Net and Deep Learning Semantic Model
Title | MayoNLP at SemEval-2016 Task 1: Semantic Textual Similarity based on Lexical Semantic Net and Deep Learning Semantic Model |
Authors | Naveed Afzal, Yanshan Wang, Hongfang Liu |
Abstract | |
Tasks | Information Retrieval, Machine Translation, Semantic Textual Similarity |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/S16-1103/ |
https://www.aclweb.org/anthology/S16-1103 | |
PWC | https://paperswithcode.com/paper/mayonlp-at-semeval-2016-task-1-semantic |
Repo | |
Framework | |
Crowdsourced Clustering: Querying Edges vs Triangles
Title | Crowdsourced Clustering: Querying Edges vs Triangles |
Authors | Ramya Korlakai Vinayak, Babak Hassibi |
Abstract | We consider the task of clustering items using answers from non-expert crowd workers. In such cases, the workers are often not able to label the items directly, however, it is reasonable to assume that they can compare items and judge whether they are similar or not. An important question is what queries to make, and we compare two types: random edge queries, where a pair of items is revealed, and random triangles, where a triple is. Since it is far too expensive to query all possible edges and/or triangles, we need to work with partial observations subject to a fixed query budget constraint. When a generative model for the data is available (and we consider a few of these) we determine the cost of a query by its entropy; when such models do not exist we use the average response time per query of the workers as a surrogate for the cost. In addition to theoretical justification, through several simulations and experiments on two real data sets on Amazon Mechanical Turk, we empirically demonstrate that, for a fixed budget, triangle queries uniformly outperform edge queries. Even though, in contrast to edge queries, triangle queries reveal dependent edges, they provide more reliable edges and, for a fixed budget, many more of them. We also provide a sufficient condition on the number of observations, edge densities inside and outside the clusters and the minimum cluster size required for the exact recovery of the true adjacency matrix via triangle queries using a convex optimization-based clustering algorithm. |
Tasks | |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6499-crowdsourced-clustering-querying-edges-vs-triangles |
http://papers.nips.cc/paper/6499-crowdsourced-clustering-querying-edges-vs-triangles.pdf | |
PWC | https://paperswithcode.com/paper/crowdsourced-clustering-querying-edges-vs |
Repo | |
Framework | |
Earthquake magnitude prediction in Hindukush region using machine learning techniques
Title | Earthquake magnitude prediction in Hindukush region using machine learning techniques |
Authors | Khawaja Asim, Francisco Martínez-Álvarez, Abdul Basit, Talat Iqbal |
Abstract | Earthquake magnitude prediction for Hindukush region has been carried out in this research using the temporal sequence of historic seismic activities in combination with the machine learning classifiers. Prediction has been made on the basis of mathematically calculated eight seismic indicators using the earthquake catalog of the region. These parameters are based on the well-known geophysical facts of Gutenberg–Richter’s inverse law, distribution of characteristic earthquake magnitudes and seismic quiescence. In this research, four machine learning techniques including pattern recognition neural network, recurrent neural network, random forest and linear programming boost ensemble classifier are separately applied to model relationships between calculated seismic parameters and future earthquake occurrences. The problem is formulated as a binary classification task and predictions are made for earthquakes of magnitude greater than or equal to 5.5 ((M \ge) 5.5), for the duration of 1 month. Furthermore, the analysis of earthquake prediction results is carried out for every machine learning classifier in terms of sensitivity, specificity, true and false predictive values. Accuracy is another performance measure considered for analyzing the results. Earthquake magnitude prediction for the Hindukush using these aforementioned techniques show significant and encouraging results, thus constituting a step forward toward the final robust prediction mechanism which is not available so far. |
Tasks | |
Published | 2016-09-08 |
URL | https://www.researchgate.net/publication/307951466_Earthquake_magnitude_prediction_in_Hindukush_region_using_machine_learning_techniques |
https://www.researchgate.net/publication/307951466_Earthquake_magnitude_prediction_in_Hindukush_region_using_machine_learning_techniques | |
PWC | https://paperswithcode.com/paper/earthquake-magnitude-prediction-in-hindukush |
Repo | |
Framework | |
Palabras: Crowdsourcing Transcriptions of L2 Speech
Title | Palabras: Crowdsourcing Transcriptions of L2 Speech |
Authors | S, Eric ers, Pepi Burgos, Catia Cucchiarini, Roel van Hout, |
Abstract | We developed a web application for crowdsourcing transcriptions of Dutch words spoken by Spanish L2 learners. In this paper we discuss the design of the application and the influence of metadata and various forms of feedback. Useful data were obtained from 159 participants, with an average of over 20 transcriptions per item, which seems a satisfactory result for this type of research. Informing participants about how many items they still had to complete, and not how many they had already completed, turned to be an incentive to do more items. Assigning participants a score for their performance made it more attractive for them to carry out the transcription task, but this seemed to influence their performance. We discuss possible advantages and disadvantages in connection with the aim of the research and consider possible lessons for designing future experiments. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1508/ |
https://www.aclweb.org/anthology/L16-1508 | |
PWC | https://paperswithcode.com/paper/palabras-crowdsourcing-transcriptions-of-l2 |
Repo | |
Framework | |
Solving Verbal Questions in IQ Test by Knowledge-Powered Word Embedding
Title | Solving Verbal Questions in IQ Test by Knowledge-Powered Word Embedding |
Authors | Huazheng Wang, Fei Tian, Bin Gao, Chengjieren Zhu, Jiang Bian, Tie-Yan Liu |
Abstract | |
Tasks | Face Recognition, Question Answering, Speech Recognition |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/D16-1052/ |
https://www.aclweb.org/anthology/D16-1052 | |
PWC | https://paperswithcode.com/paper/solving-verbal-questions-in-iq-test-by |
Repo | |
Framework | |
Emergent: a novel data-set for stance classification
Title | Emergent: a novel data-set for stance classification |
Authors | William Ferreira, Andreas Vlachos |
Abstract | |
Tasks | Natural Language Inference, Reading Comprehension, Rumour Detection |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/N16-1138/ |
https://www.aclweb.org/anthology/N16-1138 | |
PWC | https://paperswithcode.com/paper/emergent-a-novel-data-set-for-stance |
Repo | |
Framework | |
Source Language Adaptation Approaches for Resource-Poor Machine Translation
Title | Source Language Adaptation Approaches for Resource-Poor Machine Translation |
Authors | Pidong Wang, Preslav Nakov, Hwee Tou Ng |
Abstract | |
Tasks | Machine Translation |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/J16-2004/ |
https://www.aclweb.org/anthology/J16-2004 | |
PWC | https://paperswithcode.com/paper/source-language-adaptation-approaches-for |
Repo | |
Framework | |
Lifetime Achievement Award: Linguistics: The Garden and the Bush
Title | Lifetime Achievement Award: Linguistics: The Garden and the Bush |
Authors | Joan Bresnan |
Abstract | |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/J16-4001/ |
https://www.aclweb.org/anthology/J16-4001 | |
PWC | https://paperswithcode.com/paper/lifetime-achievement-award-linguistics-the |
Repo | |
Framework | |
An in-network data cleaning approach for wireless sensor networks
Title | An in-network data cleaning approach for wireless sensor networks |
Authors | Jianjun Leia, Haiyang Bia, Ying Xiaa, Jun Huanga and Haeyoung Baeb |
Abstract | Wireless Sensor Networks (WSNs) are widely used for monitoring physical happenings of the environment. However, the data gathered by the WSNs may be inaccurate and unreliable due to power exhaustion, noise and other reasons. Unnecessary data such as erroneous data and redundant data transmission causes a lot of extra energy consumption. To improve the data reliability and reduce the energy consumption, we proposed an in-network processing architecture for data cleaning, which divides the task into four stages implemented in different nodes respectively. This strategy guaranteed the cleaning algorithms were computationally lightweight in local nodes and energy-efficient due to almost no communication overhead. In addition, we presented the detection algorithms for data faults and event outliers, which were conducted by utilizing the related attributes from the local sensor node and the cooperation with its relaying neighbor. Experiment results show that our proposed approach is accurate and energy-efficient. |
Tasks | Clustering Algorithms Evaluation |
Published | 2016-03-17 |
URL | http://www.tandfonline.com/loi/tasj20 |
http://www.tandfonline.com/loi/tasj20 | |
PWC | https://paperswithcode.com/paper/an-in-network-data-cleaning-approach-for |
Repo | |
Framework | |
Upper Bound of Entropy Rate Revisited —A New Extrapolation of Compressed Large-Scale Corpora—
Title | Upper Bound of Entropy Rate Revisited —A New Extrapolation of Compressed Large-Scale Corpora— |
Authors | Ryosuke Takahira, Kumiko Tanaka-Ishii, {\L}ukasz D{\k{e}}bowski |
Abstract | The article presents results of entropy rate estimation for human languages across six languages by using large, state-of-the-art corpora of up to 7.8 gigabytes. To obtain the estimates for data length tending to infinity, we use an extrapolation function given by an ansatz. Whereas some ansatzes of this kind were proposed in previous research papers, here we introduce a stretched exponential extrapolation function that has a smaller error of fit. In this way, we uncover a possibility that the entropy rates of human languages are positive but 20{%} smaller than previously reported. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4124/ |
https://www.aclweb.org/anthology/W16-4124 | |
PWC | https://paperswithcode.com/paper/upper-bound-of-entropy-rate-revisited-a-a-new |
Repo | |
Framework | |
The Social Mood of News: Self-reported Annotations to Design Automatic Mood Detection Systems
Title | The Social Mood of News: Self-reported Annotations to Design Automatic Mood Detection Systems |
Authors | Firoj Alam, Fabio Celli, Evgeny A. Stepanov, Arindam Ghosh, Giuseppe Riccardi |
Abstract | In this paper, we address the issue of automatic prediction of readers{'} mood from newspaper articles and comments. As online newspapers are becoming more and more similar to social media platforms, users can provide affective feedback, such as mood and emotion. We have exploited the self-reported annotation of mood categories obtained from the metadata of the Italian online newspaper corriere.it to design and evaluate a system for predicting five different mood categories from news articles and comments: indignation, disappointment, worry, satisfaction, and amusement. The outcome of our experiments shows that overall, bag-of-word-ngrams perform better compared to all other feature sets; however, stylometric features perform better for the mood score prediction of articles. Our study shows that self-reported annotations can be used to design automatic mood prediction systems. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4316/ |
https://www.aclweb.org/anthology/W16-4316 | |
PWC | https://paperswithcode.com/paper/the-social-mood-of-news-self-reported |
Repo | |
Framework | |
Improved Error Bounds for Tree Representations of Metric Spaces
Title | Improved Error Bounds for Tree Representations of Metric Spaces |
Authors | Samir Chowdhury, Facundo Mémoli, Zane T. Smith |
Abstract | Estimating optimal phylogenetic trees or hierarchical clustering trees from metric data is an important problem in evolutionary biology and data analysis. Intuitively, the goodness-of-fit of a metric space to a tree depends on its inherent treeness, as well as other metric properties such as intrinsic dimension. Existing algorithms for embedding metric spaces into tree metrics provide distortion bounds depending on cardinality. Because cardinality is a simple property of any set, we argue that such bounds do not fully capture the rich structure endowed by the metric. We consider an embedding of a metric space into a tree proposed by Gromov. By proving a stability result, we obtain an improved additive distortion bound depending only on the hyperbolicity and doubling dimension of the metric. We observe that Gromov’s method is dual to the well-known single linkage hierarchical clustering (SLHC) method. By means of this duality, we are able to transport our results to the setting of SLHC, where such additive distortion bounds were previously unknown. |
Tasks | |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6431-improved-error-bounds-for-tree-representations-of-metric-spaces |
http://papers.nips.cc/paper/6431-improved-error-bounds-for-tree-representations-of-metric-spaces.pdf | |
PWC | https://paperswithcode.com/paper/improved-error-bounds-for-tree |
Repo | |
Framework | |