Paper Group NANR 143
Alibaba at IJCNLP-2017 Task 1: Embedding Grammatical Features into LSTMs for Chinese Grammatical Error Diagnosis Task. Communication-Efficient Distributed Learning of Discrete Distributions. Weighted-Entropy-Based Quantization for Deep Neural Networks. How Close Are the Eigenvectors of the Sample and Actual Covariance Matrices?. Detection of Chines …
Alibaba at IJCNLP-2017 Task 1: Embedding Grammatical Features into LSTMs for Chinese Grammatical Error Diagnosis Task
Title | Alibaba at IJCNLP-2017 Task 1: Embedding Grammatical Features into LSTMs for Chinese Grammatical Error Diagnosis Task |
Authors | Yi Yang, Pengjun Xie, Jun Tao, Guangwei Xu, Linlin Li, Luo Si |
Abstract | This paper introduces Alibaba NLP team system on IJCNLP 2017 shared task No. 1 Chinese Grammatical Error Diagnosis (CGED). The task is to diagnose four types of grammatical errors which are redundant words (R), missing words (M), bad word selection (S) and disordered words (W). We treat the task as a sequence tagging problem and design some handcraft features to solve it. Our system is mainly based on the LSTM-CRF model and 3 ensemble strategies are applied to improve the performance. At the identification level and the position level our system gets the highest F1 scores. At the position level, which is the most difficult level, we perform best on all metrics. |
Tasks | |
Published | 2017-12-01 |
URL | https://www.aclweb.org/anthology/I17-4006/ |
https://www.aclweb.org/anthology/I17-4006 | |
PWC | https://paperswithcode.com/paper/alibaba-at-ijcnlp-2017-task-1-embedding |
Repo | |
Framework | |
Communication-Efficient Distributed Learning of Discrete Distributions
Title | Communication-Efficient Distributed Learning of Discrete Distributions |
Authors | Ilias Diakonikolas, Elena Grigorescu, Jerry Li, Abhiram Natarajan, Krzysztof Onak, Ludwig Schmidt |
Abstract | We initiate a systematic investigation of distribution learning (density estimation) when the data is distributed across multiple servers. The servers must communicate with a referee and the goal is to estimate the underlying distribution with as few bits of communication as possible. We focus on non-parametric density estimation of discrete distributions with respect to the l1 and l2 norms. We provide the first non-trivial upper and lower bounds on the communication complexity of this basic estimation task in various settings of interest. Specifically, our results include the following: 1. When the unknown discrete distribution is unstructured and each server has only one sample, we show that any blackboard protocol (i.e., any protocol in which servers interact arbitrarily using public messages) that learns the distribution must essentially communicate the entire sample. 2. For the case of structured distributions, such as k-histograms and monotone distributions, we design distributed learning algorithms that achieve significantly better communication guarantees than the naive ones, and obtain tight upper and lower bounds in several regimes. Our distributed learning algorithms run in near-linear time and are robust to model misspecification. Our results provide insights on the interplay between structure and communication efficiency for a range of fundamental distribution estimation tasks. |
Tasks | Density Estimation |
Published | 2017-12-01 |
URL | http://papers.nips.cc/paper/7218-communication-efficient-distributed-learning-of-discrete-distributions |
http://papers.nips.cc/paper/7218-communication-efficient-distributed-learning-of-discrete-distributions.pdf | |
PWC | https://paperswithcode.com/paper/communication-efficient-distributed-learning |
Repo | |
Framework | |
Weighted-Entropy-Based Quantization for Deep Neural Networks
Title | Weighted-Entropy-Based Quantization for Deep Neural Networks |
Authors | Eunhyeok Park, Junwhan Ahn, Sungjoo Yoo |
Abstract | Quantization is considered as one of the most effective methods to optimize the inference cost of neural network models for their deployment to mobile and embedded systems, which have tight resource constraints. In such approaches, it is critical to provide low-cost quantization under a tight accuracy loss constraint (e.g., 1%). In this paper, we propose a novel method for quantizing weights and activations based on the concept of weighted entropy. Unlike recent work on binary-weight neural networks, our approach is multi-bit quantization, in which weights and activations can be quantized by any number of bits depending on the target accuracy. This facilitates much more flexible exploitation of accuracy-performance trade-off provided by different levels of quantization. Moreover, our scheme provides an automated quantization flow based on conventional training algorithms, which greatly reduces the design-time effort to quantize the network. According to our extensive evaluations based on practical neural network models for image classification (AlexNet, GoogLeNet and ResNet-50/101), object detection (R-FCN with 50-layer ResNet), and language modeling (an LSTM network), our method achieves significant reductions in both the model size and the amount of computation with minimal accuracy loss. Also, compared to existing quantization schemes, ours provides higher accuracy with a similar resource constraint and requires much lower design effort. |
Tasks | Image Classification, Language Modelling, Object Detection, Quantization |
Published | 2017-07-01 |
URL | http://openaccess.thecvf.com/content_cvpr_2017/html/Park_Weighted-Entropy-Based_Quantization_for_CVPR_2017_paper.html |
http://openaccess.thecvf.com/content_cvpr_2017/papers/Park_Weighted-Entropy-Based_Quantization_for_CVPR_2017_paper.pdf | |
PWC | https://paperswithcode.com/paper/weighted-entropy-based-quantization-for-deep |
Repo | |
Framework | |
How Close Are the Eigenvectors of the Sample and Actual Covariance Matrices?
Title | How Close Are the Eigenvectors of the Sample and Actual Covariance Matrices? |
Authors | Andreas Loukas |
Abstract | How many samples are sufficient to guarantee that the eigenvectors of the sample covariance matrix are close to those of the actual covariance matrix? For a wide family of distributions, including distributions with finite second moment and sub-gaussian distributions supported in a centered Euclidean ball, we prove that the inner product between eigenvectors of the sample and actual covariance matrices decreases proportionally to the respective eigenvalue distance and the number of samples. Our findings imply non-asymptotic concentration bounds for eigenvectors and eigenvalues and carry strong consequences for the non-asymptotic analysis of PCA and its applications. For instance, they provide conditions for separating components estimated from $O(1)$ samples and show that even few samples can be sufficient to perform dimensionality reduction, especially for low-rank covariances. |
Tasks | Dimensionality Reduction |
Published | 2017-08-01 |
URL | https://icml.cc/Conferences/2017/Schedule?showEvent=489 |
http://proceedings.mlr.press/v70/loukas17a/loukas17a.pdf | |
PWC | https://paperswithcode.com/paper/how-close-are-the-eigenvectors-of-the-sample |
Repo | |
Framework | |
Detection of Chinese Word Usage Errors for Non-Native Chinese Learners with Bidirectional LSTM
Title | Detection of Chinese Word Usage Errors for Non-Native Chinese Learners with Bidirectional LSTM |
Authors | Yow-Ting Shiue, Hen-Hsen Huang, Hsin-Hsi Chen |
Abstract | Selecting appropriate words to compose a sentence is one common problem faced by non-native Chinese learners. In this paper, we propose (bidirectional) LSTM sequence labeling models and explore various features to detect word usage errors in Chinese sentences. By combining CWINDOW word embedding features and POS information, the best bidirectional LSTM model achieves accuracy 0.5138 and MRR 0.6789 on the HSK dataset. For 80.79{%} of the test data, the model ranks the ground-truth within the top two at position level. |
Tasks | Grammatical Error Detection |
Published | 2017-07-01 |
URL | https://www.aclweb.org/anthology/P17-2064/ |
https://www.aclweb.org/anthology/P17-2064 | |
PWC | https://paperswithcode.com/paper/detection-of-chinese-word-usage-errors-for |
Repo | |
Framework | |
Multi-Channel Lexicon Integrated CNN-BiLSTM Models for Sentiment Analysis
Title | Multi-Channel Lexicon Integrated CNN-BiLSTM Models for Sentiment Analysis |
Authors | Joosung Yoon, Hyeoncheol Kim |
Abstract | |
Tasks | Opinion Mining, Sentiment Analysis |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/O17-1023/ |
https://www.aclweb.org/anthology/O17-1023 | |
PWC | https://paperswithcode.com/paper/multi-channel-lexicon-integrated-cnn-bilstm |
Repo | |
Framework | |
Multi-Domain Aspect Extraction Using Support Vector Machines
Title | Multi-Domain Aspect Extraction Using Support Vector Machines |
Authors | Nadheesh Jihan, Yasas Senarath, Dulanjaya Tennekoon, Mithila Wickramarathne, Surangika Ranathunga |
Abstract | |
Tasks | Aspect-Based Sentiment Analysis, Aspect Extraction, Sentiment Analysis |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/O17-1029/ |
https://www.aclweb.org/anthology/O17-1029 | |
PWC | https://paperswithcode.com/paper/multi-domain-aspect-extraction-using-support |
Repo | |
Framework | |
以軟體為基礎建構語音增強系統使用者介面 (Development of a software-based User-Interface of Speech Enhancement System) [In Chinese]
Title | 以軟體為基礎建構語音增強系統使用者介面 (Development of a software-based User-Interface of Speech Enhancement System) [In Chinese] |
Authors | Tao-Wei Wang, Yu Tsao, Ying-Hui Lai, Hsiang-Ping Hsu, Chia-Lung Wu |
Abstract | |
Tasks | Speech Enhancement |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/O17-1030/ |
https://www.aclweb.org/anthology/O17-1030 | |
PWC | https://paperswithcode.com/paper/aeecoaocaoeae3aa14c3ca12c-eae-development-of |
Repo | |
Framework | |
序列標記與配對方法用於語音辨識錯誤偵測及修正 (On the Use of Sequence Labeling and Matching Methods for ASR Error Detection and Correction) [In Chinese]
Title | 序列標記與配對方法用於語音辨識錯誤偵測及修正 (On the Use of Sequence Labeling and Matching Methods for ASR Error Detection and Correction) [In Chinese] |
Authors | Chia-Hua Wu, Chun-I Tsai, Hsiao-Tsung Hung, Yu-Chen Kao, Berlin Chen |
Abstract | |
Tasks | Speech Recognition |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/O17-1032/ |
https://www.aclweb.org/anthology/O17-1032 | |
PWC | https://paperswithcode.com/paper/aoa-e-eea13c-14eae3e34-ee-eaa-aa-on-the-use |
Repo | |
Framework | |
Nonlinear random matrix theory for deep learning
Title | Nonlinear random matrix theory for deep learning |
Authors | Jeffrey Pennington, Pratik Worah |
Abstract | Neural network configurations with random weights play an important role in the analysis of deep learning. They define the initial loss landscape and are closely related to kernel and random feature methods. Despite the fact that these networks are built out of random matrices, the vast and powerful machinery of random matrix theory has so far found limited success in studying them. A main obstacle in this direction is that neural networks are nonlinear, which prevents the straightforward utilization of many of the existing mathematical results. In this work, we open the door for direct applications of random matrix theory to deep learning by demonstrating that the pointwise nonlinearities typically applied in neural networks can be incorporated into a standard method of proof in random matrix theory known as the moments method. The test case for our study is the Gram matrix $Y^TY$, $Y=f(WX)$, where $W$ is a random weight matrix, $X$ is a random data matrix, and $f$ is a pointwise nonlinear activation function. We derive an explicit representation for the trace of the resolvent of this matrix, which defines its limiting spectral distribution. We apply these results to the computation of the asymptotic performance of single-layer random feature methods on a memorization task and to the analysis of the eigenvalues of the data covariance matrix as it propagates through a neural network. As a byproduct of our analysis, we identify an intriguing new class of activation functions with favorable properties. |
Tasks | |
Published | 2017-12-01 |
URL | http://papers.nips.cc/paper/6857-nonlinear-random-matrix-theory-for-deep-learning |
http://papers.nips.cc/paper/6857-nonlinear-random-matrix-theory-for-deep-learning.pdf | |
PWC | https://paperswithcode.com/paper/nonlinear-random-matrix-theory-for-deep |
Repo | |
Framework | |
Medication and Adverse Event Extraction from Noisy Text
Title | Medication and Adverse Event Extraction from Noisy Text |
Authors | Xiang Dai, Sarvnaz Karimi, Cecile Paris |
Abstract | |
Tasks | Named Entity Recognition |
Published | 2017-12-01 |
URL | https://www.aclweb.org/anthology/U17-1009/ |
https://www.aclweb.org/anthology/U17-1009 | |
PWC | https://paperswithcode.com/paper/medication-and-adverse-event-extraction-from |
Repo | |
Framework | |
Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition
Title | Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition |
Authors | Leon Derczynski, Eric Nichols, Marieke van Erp, Nut Limsopatham |
Abstract | This shared task focuses on identifying unusual, previously-unseen entities in the context of emerging discussions. Named entities form the basis of many modern approaches to other tasks (like event clustering and summarization), but recall on them is a real problem in noisy text - even among annotators. This drop tends to be due to novel entities and surface forms. Take for example the tweet {``}so.. kktny in 30 mins?!{''} {–} even human experts find the entity {`}kktny{'} hard to detect and resolve. The goal of this task is to provide a definition of emerging and of rare entities, and based on that, also datasets for detecting these entities. The task as described in this paper evaluated the ability of participating entries to detect and classify novel and emerging named entities in noisy text. | |
Tasks | Named Entity Recognition |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-4418/ |
https://www.aclweb.org/anthology/W17-4418 | |
PWC | https://paperswithcode.com/paper/results-of-the-wnut2017-shared-task-on-novel |
Repo | |
Framework | |
Learning Fine-Grained Expressions to Solve Math Word Problems
Title | Learning Fine-Grained Expressions to Solve Math Word Problems |
Authors | Danqing Huang, Shuming Shi, Chin-Yew Lin, Jian Yin |
Abstract | This paper presents a novel template-based method to solve math word problems. This method learns the mappings between math concept phrases in math word problems and their math expressions from training data. For each equation template, we automatically construct a rich template sketch by aggregating information from various problems with the same template. Our approach is implemented in a two-stage system. It first retrieves a few relevant equation system templates and aligns numbers in math word problems to those templates for candidate equation generation. It then does a fine-grained inference to obtain the final answer. Experiment results show that our method achieves an accuracy of 28.4{%} on the linear Dolphin18K benchmark, which is 10{%} (54{%} relative) higher than previous state-of-the-art systems while achieving an accuracy increase of 12{%} (59{%} relative) on the TS6 benchmark subset. |
Tasks | Math Word Problem Solving |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/D17-1084/ |
https://www.aclweb.org/anthology/D17-1084 | |
PWC | https://paperswithcode.com/paper/learning-fine-grained-expressions-to-solve |
Repo | |
Framework | |
Deep Neural Solver for Math Word Problems
Title | Deep Neural Solver for Math Word Problems |
Authors | Yan Wang, Xiaojiang Liu, Shuming Shi |
Abstract | This paper presents a deep neural solver to automatically solve math word problems. In contrast to previous statistical learning approaches, we directly translate math word problems to equation templates using a recurrent neural network (RNN) model, without sophisticated feature engineering. We further design a hybrid model that combines the RNN model and a similarity-based retrieval model to achieve additional performance improvement. Experiments conducted on a large dataset show that the RNN model and the hybrid model significantly outperform state-of-the-art statistical learning methods for math word problem solving. |
Tasks | Feature Engineering, Machine Translation, Math Word Problem Solving, Semantic Parsing |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/D17-1088/ |
https://www.aclweb.org/anthology/D17-1088 | |
PWC | https://paperswithcode.com/paper/deep-neural-solver-for-math-word-problems |
Repo | |
Framework | |
The BreakingNews Dataset
Title | The BreakingNews Dataset |
Authors | Arnau Ramisa, Fei Yan, Francesc Moreno-Noguer, Krystian Mikolajczyk |
Abstract | We present BreakingNews, a novel dataset with approximately 100K news articles including images, text and captions, and enriched with heterogeneous meta-data (e.g. GPS coordinates and popularity metrics). The tenuous connection between the images and text in news data is appropriate to take work at the intersection of Computer Vision and Natural Language Processing to the next step, hence we hope this dataset will help spur progress in the field. |
Tasks | Image Captioning, Sentiment Analysis |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-2005/ |
https://www.aclweb.org/anthology/W17-2005 | |
PWC | https://paperswithcode.com/paper/the-breakingnews-dataset |
Repo | |
Framework | |