July 26, 2019

2099 words 10 mins read

Paper Group NANR 143

Paper Group NANR 143

Alibaba at IJCNLP-2017 Task 1: Embedding Grammatical Features into LSTMs for Chinese Grammatical Error Diagnosis Task. Communication-Efficient Distributed Learning of Discrete Distributions. Weighted-Entropy-Based Quantization for Deep Neural Networks. How Close Are the Eigenvectors of the Sample and Actual Covariance Matrices?. Detection of Chines …

Alibaba at IJCNLP-2017 Task 1: Embedding Grammatical Features into LSTMs for Chinese Grammatical Error Diagnosis Task

Title Alibaba at IJCNLP-2017 Task 1: Embedding Grammatical Features into LSTMs for Chinese Grammatical Error Diagnosis Task
Authors Yi Yang, Pengjun Xie, Jun Tao, Guangwei Xu, Linlin Li, Luo Si
Abstract This paper introduces Alibaba NLP team system on IJCNLP 2017 shared task No. 1 Chinese Grammatical Error Diagnosis (CGED). The task is to diagnose four types of grammatical errors which are redundant words (R), missing words (M), bad word selection (S) and disordered words (W). We treat the task as a sequence tagging problem and design some handcraft features to solve it. Our system is mainly based on the LSTM-CRF model and 3 ensemble strategies are applied to improve the performance. At the identification level and the position level our system gets the highest F1 scores. At the position level, which is the most difficult level, we perform best on all metrics.
Tasks
Published 2017-12-01
URL https://www.aclweb.org/anthology/I17-4006/
PDF https://www.aclweb.org/anthology/I17-4006
PWC https://paperswithcode.com/paper/alibaba-at-ijcnlp-2017-task-1-embedding
Repo
Framework

Communication-Efficient Distributed Learning of Discrete Distributions

Title Communication-Efficient Distributed Learning of Discrete Distributions
Authors Ilias Diakonikolas, Elena Grigorescu, Jerry Li, Abhiram Natarajan, Krzysztof Onak, Ludwig Schmidt
Abstract We initiate a systematic investigation of distribution learning (density estimation) when the data is distributed across multiple servers. The servers must communicate with a referee and the goal is to estimate the underlying distribution with as few bits of communication as possible. We focus on non-parametric density estimation of discrete distributions with respect to the l1 and l2 norms. We provide the first non-trivial upper and lower bounds on the communication complexity of this basic estimation task in various settings of interest. Specifically, our results include the following: 1. When the unknown discrete distribution is unstructured and each server has only one sample, we show that any blackboard protocol (i.e., any protocol in which servers interact arbitrarily using public messages) that learns the distribution must essentially communicate the entire sample. 2. For the case of structured distributions, such as k-histograms and monotone distributions, we design distributed learning algorithms that achieve significantly better communication guarantees than the naive ones, and obtain tight upper and lower bounds in several regimes. Our distributed learning algorithms run in near-linear time and are robust to model misspecification. Our results provide insights on the interplay between structure and communication efficiency for a range of fundamental distribution estimation tasks.
Tasks Density Estimation
Published 2017-12-01
URL http://papers.nips.cc/paper/7218-communication-efficient-distributed-learning-of-discrete-distributions
PDF http://papers.nips.cc/paper/7218-communication-efficient-distributed-learning-of-discrete-distributions.pdf
PWC https://paperswithcode.com/paper/communication-efficient-distributed-learning
Repo
Framework

Weighted-Entropy-Based Quantization for Deep Neural Networks

Title Weighted-Entropy-Based Quantization for Deep Neural Networks
Authors Eunhyeok Park, Junwhan Ahn, Sungjoo Yoo
Abstract Quantization is considered as one of the most effective methods to optimize the inference cost of neural network models for their deployment to mobile and embedded systems, which have tight resource constraints. In such approaches, it is critical to provide low-cost quantization under a tight accuracy loss constraint (e.g., 1%). In this paper, we propose a novel method for quantizing weights and activations based on the concept of weighted entropy. Unlike recent work on binary-weight neural networks, our approach is multi-bit quantization, in which weights and activations can be quantized by any number of bits depending on the target accuracy. This facilitates much more flexible exploitation of accuracy-performance trade-off provided by different levels of quantization. Moreover, our scheme provides an automated quantization flow based on conventional training algorithms, which greatly reduces the design-time effort to quantize the network. According to our extensive evaluations based on practical neural network models for image classification (AlexNet, GoogLeNet and ResNet-50/101), object detection (R-FCN with 50-layer ResNet), and language modeling (an LSTM network), our method achieves significant reductions in both the model size and the amount of computation with minimal accuracy loss. Also, compared to existing quantization schemes, ours provides higher accuracy with a similar resource constraint and requires much lower design effort.
Tasks Image Classification, Language Modelling, Object Detection, Quantization
Published 2017-07-01
URL http://openaccess.thecvf.com/content_cvpr_2017/html/Park_Weighted-Entropy-Based_Quantization_for_CVPR_2017_paper.html
PDF http://openaccess.thecvf.com/content_cvpr_2017/papers/Park_Weighted-Entropy-Based_Quantization_for_CVPR_2017_paper.pdf
PWC https://paperswithcode.com/paper/weighted-entropy-based-quantization-for-deep
Repo
Framework

How Close Are the Eigenvectors of the Sample and Actual Covariance Matrices?

Title How Close Are the Eigenvectors of the Sample and Actual Covariance Matrices?
Authors Andreas Loukas
Abstract How many samples are sufficient to guarantee that the eigenvectors of the sample covariance matrix are close to those of the actual covariance matrix? For a wide family of distributions, including distributions with finite second moment and sub-gaussian distributions supported in a centered Euclidean ball, we prove that the inner product between eigenvectors of the sample and actual covariance matrices decreases proportionally to the respective eigenvalue distance and the number of samples. Our findings imply non-asymptotic concentration bounds for eigenvectors and eigenvalues and carry strong consequences for the non-asymptotic analysis of PCA and its applications. For instance, they provide conditions for separating components estimated from $O(1)$ samples and show that even few samples can be sufficient to perform dimensionality reduction, especially for low-rank covariances.
Tasks Dimensionality Reduction
Published 2017-08-01
URL https://icml.cc/Conferences/2017/Schedule?showEvent=489
PDF http://proceedings.mlr.press/v70/loukas17a/loukas17a.pdf
PWC https://paperswithcode.com/paper/how-close-are-the-eigenvectors-of-the-sample
Repo
Framework

Detection of Chinese Word Usage Errors for Non-Native Chinese Learners with Bidirectional LSTM

Title Detection of Chinese Word Usage Errors for Non-Native Chinese Learners with Bidirectional LSTM
Authors Yow-Ting Shiue, Hen-Hsen Huang, Hsin-Hsi Chen
Abstract Selecting appropriate words to compose a sentence is one common problem faced by non-native Chinese learners. In this paper, we propose (bidirectional) LSTM sequence labeling models and explore various features to detect word usage errors in Chinese sentences. By combining CWINDOW word embedding features and POS information, the best bidirectional LSTM model achieves accuracy 0.5138 and MRR 0.6789 on the HSK dataset. For 80.79{%} of the test data, the model ranks the ground-truth within the top two at position level.
Tasks Grammatical Error Detection
Published 2017-07-01
URL https://www.aclweb.org/anthology/P17-2064/
PDF https://www.aclweb.org/anthology/P17-2064
PWC https://paperswithcode.com/paper/detection-of-chinese-word-usage-errors-for
Repo
Framework

Multi-Channel Lexicon Integrated CNN-BiLSTM Models for Sentiment Analysis

Title Multi-Channel Lexicon Integrated CNN-BiLSTM Models for Sentiment Analysis
Authors Joosung Yoon, Hyeoncheol Kim
Abstract
Tasks Opinion Mining, Sentiment Analysis
Published 2017-11-01
URL https://www.aclweb.org/anthology/O17-1023/
PDF https://www.aclweb.org/anthology/O17-1023
PWC https://paperswithcode.com/paper/multi-channel-lexicon-integrated-cnn-bilstm
Repo
Framework

Multi-Domain Aspect Extraction Using Support Vector Machines

Title Multi-Domain Aspect Extraction Using Support Vector Machines
Authors Nadheesh Jihan, Yasas Senarath, Dulanjaya Tennekoon, Mithila Wickramarathne, Surangika Ranathunga
Abstract
Tasks Aspect-Based Sentiment Analysis, Aspect Extraction, Sentiment Analysis
Published 2017-11-01
URL https://www.aclweb.org/anthology/O17-1029/
PDF https://www.aclweb.org/anthology/O17-1029
PWC https://paperswithcode.com/paper/multi-domain-aspect-extraction-using-support
Repo
Framework

以軟體為基礎建構語音增強系統使用者介面 (Development of a software-based User-Interface of Speech Enhancement System) [In Chinese]

Title 以軟體為基礎建構語音增強系統使用者介面 (Development of a software-based User-Interface of Speech Enhancement System) [In Chinese]
Authors Tao-Wei Wang, Yu Tsao, Ying-Hui Lai, Hsiang-Ping Hsu, Chia-Lung Wu
Abstract
Tasks Speech Enhancement
Published 2017-11-01
URL https://www.aclweb.org/anthology/O17-1030/
PDF https://www.aclweb.org/anthology/O17-1030
PWC https://paperswithcode.com/paper/aeecoaocaoeae3aa14c3ca12c-eae-development-of
Repo
Framework

序列標記與配對方法用於語音辨識錯誤偵測及修正 (On the Use of Sequence Labeling and Matching Methods for ASR Error Detection and Correction) [In Chinese]

Title 序列標記與配對方法用於語音辨識錯誤偵測及修正 (On the Use of Sequence Labeling and Matching Methods for ASR Error Detection and Correction) [In Chinese]
Authors Chia-Hua Wu, Chun-I Tsai, Hsiao-Tsung Hung, Yu-Chen Kao, Berlin Chen
Abstract
Tasks Speech Recognition
Published 2017-11-01
URL https://www.aclweb.org/anthology/O17-1032/
PDF https://www.aclweb.org/anthology/O17-1032
PWC https://paperswithcode.com/paper/aoa-e-eea13c-14eae3e34-ee-eaa-aa-on-the-use
Repo
Framework

Nonlinear random matrix theory for deep learning

Title Nonlinear random matrix theory for deep learning
Authors Jeffrey Pennington, Pratik Worah
Abstract Neural network configurations with random weights play an important role in the analysis of deep learning. They define the initial loss landscape and are closely related to kernel and random feature methods. Despite the fact that these networks are built out of random matrices, the vast and powerful machinery of random matrix theory has so far found limited success in studying them. A main obstacle in this direction is that neural networks are nonlinear, which prevents the straightforward utilization of many of the existing mathematical results. In this work, we open the door for direct applications of random matrix theory to deep learning by demonstrating that the pointwise nonlinearities typically applied in neural networks can be incorporated into a standard method of proof in random matrix theory known as the moments method. The test case for our study is the Gram matrix $Y^TY$, $Y=f(WX)$, where $W$ is a random weight matrix, $X$ is a random data matrix, and $f$ is a pointwise nonlinear activation function. We derive an explicit representation for the trace of the resolvent of this matrix, which defines its limiting spectral distribution. We apply these results to the computation of the asymptotic performance of single-layer random feature methods on a memorization task and to the analysis of the eigenvalues of the data covariance matrix as it propagates through a neural network. As a byproduct of our analysis, we identify an intriguing new class of activation functions with favorable properties.
Tasks
Published 2017-12-01
URL http://papers.nips.cc/paper/6857-nonlinear-random-matrix-theory-for-deep-learning
PDF http://papers.nips.cc/paper/6857-nonlinear-random-matrix-theory-for-deep-learning.pdf
PWC https://paperswithcode.com/paper/nonlinear-random-matrix-theory-for-deep
Repo
Framework

Medication and Adverse Event Extraction from Noisy Text

Title Medication and Adverse Event Extraction from Noisy Text
Authors Xiang Dai, Sarvnaz Karimi, Cecile Paris
Abstract
Tasks Named Entity Recognition
Published 2017-12-01
URL https://www.aclweb.org/anthology/U17-1009/
PDF https://www.aclweb.org/anthology/U17-1009
PWC https://paperswithcode.com/paper/medication-and-adverse-event-extraction-from
Repo
Framework

Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition

Title Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition
Authors Leon Derczynski, Eric Nichols, Marieke van Erp, Nut Limsopatham
Abstract This shared task focuses on identifying unusual, previously-unseen entities in the context of emerging discussions. Named entities form the basis of many modern approaches to other tasks (like event clustering and summarization), but recall on them is a real problem in noisy text - even among annotators. This drop tends to be due to novel entities and surface forms. Take for example the tweet {``}so.. kktny in 30 mins?!{''} {–} even human experts find the entity {`}kktny{'} hard to detect and resolve. The goal of this task is to provide a definition of emerging and of rare entities, and based on that, also datasets for detecting these entities. The task as described in this paper evaluated the ability of participating entries to detect and classify novel and emerging named entities in noisy text. |
Tasks Named Entity Recognition
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-4418/
PDF https://www.aclweb.org/anthology/W17-4418
PWC https://paperswithcode.com/paper/results-of-the-wnut2017-shared-task-on-novel
Repo
Framework

Learning Fine-Grained Expressions to Solve Math Word Problems

Title Learning Fine-Grained Expressions to Solve Math Word Problems
Authors Danqing Huang, Shuming Shi, Chin-Yew Lin, Jian Yin
Abstract This paper presents a novel template-based method to solve math word problems. This method learns the mappings between math concept phrases in math word problems and their math expressions from training data. For each equation template, we automatically construct a rich template sketch by aggregating information from various problems with the same template. Our approach is implemented in a two-stage system. It first retrieves a few relevant equation system templates and aligns numbers in math word problems to those templates for candidate equation generation. It then does a fine-grained inference to obtain the final answer. Experiment results show that our method achieves an accuracy of 28.4{%} on the linear Dolphin18K benchmark, which is 10{%} (54{%} relative) higher than previous state-of-the-art systems while achieving an accuracy increase of 12{%} (59{%} relative) on the TS6 benchmark subset.
Tasks Math Word Problem Solving
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-1084/
PDF https://www.aclweb.org/anthology/D17-1084
PWC https://paperswithcode.com/paper/learning-fine-grained-expressions-to-solve
Repo
Framework

Deep Neural Solver for Math Word Problems

Title Deep Neural Solver for Math Word Problems
Authors Yan Wang, Xiaojiang Liu, Shuming Shi
Abstract This paper presents a deep neural solver to automatically solve math word problems. In contrast to previous statistical learning approaches, we directly translate math word problems to equation templates using a recurrent neural network (RNN) model, without sophisticated feature engineering. We further design a hybrid model that combines the RNN model and a similarity-based retrieval model to achieve additional performance improvement. Experiments conducted on a large dataset show that the RNN model and the hybrid model significantly outperform state-of-the-art statistical learning methods for math word problem solving.
Tasks Feature Engineering, Machine Translation, Math Word Problem Solving, Semantic Parsing
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-1088/
PDF https://www.aclweb.org/anthology/D17-1088
PWC https://paperswithcode.com/paper/deep-neural-solver-for-math-word-problems
Repo
Framework

The BreakingNews Dataset

Title The BreakingNews Dataset
Authors Arnau Ramisa, Fei Yan, Francesc Moreno-Noguer, Krystian Mikolajczyk
Abstract We present BreakingNews, a novel dataset with approximately 100K news articles including images, text and captions, and enriched with heterogeneous meta-data (e.g. GPS coordinates and popularity metrics). The tenuous connection between the images and text in news data is appropriate to take work at the intersection of Computer Vision and Natural Language Processing to the next step, hence we hope this dataset will help spur progress in the field.
Tasks Image Captioning, Sentiment Analysis
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-2005/
PDF https://www.aclweb.org/anthology/W17-2005
PWC https://paperswithcode.com/paper/the-breakingnews-dataset
Repo
Framework
comments powered by Disqus