Paper Group NANR 54
Training Neural Machines with Partial Traces. Neural sequence modelling for learner error prediction. Towards Text Generation with Adversarially Learned Neural Outlines. A Dataset and Reranking Method for Multimodal MT of User-Generated Image Captions. Annotation and Analysis of Extractive Summaries for the Kyutech Corpus. Learning to Reconstruct H …
Training Neural Machines with Partial Traces
Title | Training Neural Machines with Partial Traces |
Authors | Matthew Mirman, Dimitar Dimitrov, Pavle Djordjevich, Timon Gehr, Martin Vechev |
Abstract | We present a novel approach for training neural abstract architectures which in- corporates (partial) supervision over the machine’s interpretable components. To cleanly capture the set of neural architectures to which our method applies, we introduce the concept of a differential neural computational machine (∂NCM) and show that several existing architectures (e.g., NTMs, NRAMs) can be instantiated as a ∂NCM and can thus benefit from any amount of additional supervision over their interpretable components. Based on our method, we performed a detailed experimental evaluation with both, the NTM and NRAM architectures, and showed that the approach leads to significantly better convergence and generalization capabilities of the learning phase than when training using only input-output examples. |
Tasks | |
Published | 2018-01-01 |
URL | https://openreview.net/forum?id=S1q_Cz-Cb |
https://openreview.net/pdf?id=S1q_Cz-Cb | |
PWC | https://paperswithcode.com/paper/training-neural-machines-with-partial-traces |
Repo | |
Framework | |
Neural sequence modelling for learner error prediction
Title | Neural sequence modelling for learner error prediction |
Authors | Zheng Yuan |
Abstract | This paper describes our use of two recurrent neural network sequence models: sequence labelling and sequence-to-sequence models, for the prediction of future learner errors in our submission to the 2018 Duolingo Shared Task on Second Language Acquisition Modeling (SLAM). We show that these two models capture complementary information as combining them improves performance. Furthermore, the same network architecture and group of features can be used directly to build competitive prediction models in all three language tracks, demonstrating that our approach generalises well across languages. |
Tasks | Grammatical Error Detection, Language Acquisition |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/W18-0547/ |
https://www.aclweb.org/anthology/W18-0547 | |
PWC | https://paperswithcode.com/paper/neural-sequence-modelling-for-learner-error |
Repo | |
Framework | |
Towards Text Generation with Adversarially Learned Neural Outlines
Title | Towards Text Generation with Adversarially Learned Neural Outlines |
Authors | Sandeep Subramanian, Sai Rajeswar Mudumba, Alessandro Sordoni, Adam Trischler, Aaron C. Courville, Chris Pal |
Abstract | Recent progress in deep generative models has been fueled by two paradigms – autoregressive and adversarial models. We propose a combination of both approaches with the goal of learning generative models of text. Our method first produces a high-level sentence outline and then generates words sequentially, conditioning on both the outline and the previous outputs. We generate outlines with an adversarial model trained to approximate the distribution of sentences in a latent space induced by general-purpose sentence encoders. This provides strong, informative conditioning for the autoregressive stage. Our quantitative evaluations suggests that conditioning information from generated outlines is able to guide the autoregressive model to produce realistic samples, comparable to maximum-likelihood trained language models, even at high temperatures with multinomial sampling. Qualitative results also demonstrate that this generative procedure yields natural-looking sentences and interpolations. |
Tasks | Text Generation |
Published | 2018-12-01 |
URL | http://papers.nips.cc/paper/7983-towards-text-generation-with-adversarially-learned-neural-outlines |
http://papers.nips.cc/paper/7983-towards-text-generation-with-adversarially-learned-neural-outlines.pdf | |
PWC | https://paperswithcode.com/paper/towards-text-generation-with-adversarially |
Repo | |
Framework | |
A Dataset and Reranking Method for Multimodal MT of User-Generated Image Captions
Title | A Dataset and Reranking Method for Multimodal MT of User-Generated Image Captions |
Authors | Shigehiko Schamoni, Julian Hitschler, Stefan Riezler |
Abstract | |
Tasks | Image Captioning, Machine Translation, Multimodal Machine Translation |
Published | 2018-03-01 |
URL | https://www.aclweb.org/anthology/W18-1814/ |
https://www.aclweb.org/anthology/W18-1814 | |
PWC | https://paperswithcode.com/paper/a-dataset-and-reranking-method-for-multimodal |
Repo | |
Framework | |
Annotation and Analysis of Extractive Summaries for the Kyutech Corpus
Title | Annotation and Analysis of Extractive Summaries for the Kyutech Corpus |
Authors | Takashi Yamamura, Kazutaka Shimada |
Abstract | |
Tasks | Abstractive Text Summarization, Decision Making, Document Summarization, Meeting Summarization |
Published | 2018-05-01 |
URL | https://www.aclweb.org/anthology/L18-1508/ |
https://www.aclweb.org/anthology/L18-1508 | |
PWC | https://paperswithcode.com/paper/annotation-and-analysis-of-extractive |
Repo | |
Framework | |
Learning to Reconstruct High-quality 3D Shapes with Cascaded Fully Convolutional Networks
Title | Learning to Reconstruct High-quality 3D Shapes with Cascaded Fully Convolutional Networks |
Authors | Yan-Pei Cao, Zheng-Ning Liu, Zheng-Fei Kuang, Leif Kobbelt, Shi-Min Hu |
Abstract | We present a data-driven approach to reconstructing high-resolution and detailed volumetric representations of 3D shapes. Although well studied, algorithms for volumetric fusion from multi-view depth scans are still prone to scanning noise and occlusions, making it hard to obtain high-fidelity 3D reconstructions. In this paper, inspired by recent advances in efficient 3D deep learning techniques, we introduce a novel cascaded 3D convolutional network architecture, which learns to reconstruct implicit surface representations from noisy and incomplete depth maps in a progressive, coarse-to-fine manner. To this end, we also develop an algorithm for end-to-end training of the proposed cascaded structure. Qualitative and quantitative experimental results on both simulated and real-world datasets demonstrate that the presented approach outperforms existing state-of-the-art work in terms of quality and fidelity of reconstructed models. |
Tasks | |
Published | 2018-09-01 |
URL | http://openaccess.thecvf.com/content_ECCV_2018/html/Yan-Pei_Cao_Learning_to_Reconstruct_ECCV_2018_paper.html |
http://openaccess.thecvf.com/content_ECCV_2018/papers/Yan-Pei_Cao_Learning_to_Reconstruct_ECCV_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-reconstruct-high-quality-3d |
Repo | |
Framework | |
Addressing Troublesome Words in Neural Machine Translation
Title | Addressing Troublesome Words in Neural Machine Translation |
Authors | Yang Zhao, Jiajun Zhang, Zhongjun He, Chengqing Zong, Hua Wu |
Abstract | One of the weaknesses of Neural Machine Translation (NMT) is in handling lowfrequency and ambiguous words, which we refer as troublesome words. To address this problem, we propose a novel memoryenhanced NMT method. First, we investigate different strategies to define and detect the troublesome words. Then, a contextual memory is constructed to memorize which target words should be produced in what situations. Finally, we design a hybrid model to dynamically access the contextual memory so as to correctly translate the troublesome words. The extensive experiments on Chinese-to-English and English-to-German translation tasks demonstrate that our method significantly outperforms the strong baseline models in translation quality, especially in handling troublesome words. |
Tasks | Machine Translation |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/D18-1036/ |
https://www.aclweb.org/anthology/D18-1036 | |
PWC | https://paperswithcode.com/paper/addressing-troublesome-words-in-neural |
Repo | |
Framework | |
Sentiment Analysis using Imperfect Views from Spoken Language and Acoustic Modalities
Title | Sentiment Analysis using Imperfect Views from Spoken Language and Acoustic Modalities |
Authors | Imran Sheikh, Sri Harsha Dumpala, Rupayan Chakraborty, Sunil Kumar Kopparapu |
Abstract | Multimodal sentiment classification in practical applications may have to rely on erroneous and imperfect views, namely (a) language transcription from a speech recognizer and (b) under-performing acoustic views. This work focuses on improving the representations of these views by performing a deep canonical correlation analysis with the representations of the better performing manual transcription view. Enhanced representations of the imperfect views can be obtained even in absence of the perfect views and give an improved performance during test conditions. Evaluations on the CMU-MOSI and CMU-MOSEI datasets demonstrate the effectiveness of the proposed approach. |
Tasks | Sentiment Analysis, Speech Recognition |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/W18-3305/ |
https://www.aclweb.org/anthology/W18-3305 | |
PWC | https://paperswithcode.com/paper/sentiment-analysis-using-imperfect-views-from |
Repo | |
Framework | |
Model-Agnostic Private Learning
Title | Model-Agnostic Private Learning |
Authors | Raef Bassily, Abhradeep Guha Thakurta, Om Dipakbhai Thakkar |
Abstract | We design differentially private learning algorithms that are agnostic to the learning model assuming access to limited amount of unlabeled public data. First, we give a new differentially private algorithm for answering a sequence of $m$ online classification queries (given by a sequence of $m$ unlabeled public feature vectors) based on a private training set. Our private algorithm follows the paradigm of subsample-and-aggregate, in which any generic non-private learner is trained on disjoint subsets of the private training set, then for each classification query, the votes of the resulting classifiers ensemble are aggregated in a differentially private fashion. Our private aggregation is based on a novel combination of distance-to-instability framework [Smith & Thakurta 2013] and the sparse-vector technique [Dwork et al. 2009, Hardt & Talwar 2010]. We show that our algorithm makes a conservative use of the privacy budget. In particular, if the underlying non-private learner yields classification error at most $\alpha\in (0, 1)$, then our construction answers more queries, by at least a factor of $1/\alpha$ in some cases, than what is implied by a straightforward application of the advanced composition theorem for differential privacy. Next, we apply the knowledge transfer technique to construct a private learner that outputs a classifier, which can be used to answer unlimited number of queries. In the PAC model, we analyze our construction and prove upper bounds on the sample complexity for both the realizable and the non-realizable cases. As in non-private sample complexity, our bounds are completely characterized by the VC dimension of the concept class. |
Tasks | Transfer Learning |
Published | 2018-12-01 |
URL | http://papers.nips.cc/paper/7941-model-agnostic-private-learning |
http://papers.nips.cc/paper/7941-model-agnostic-private-learning.pdf | |
PWC | https://paperswithcode.com/paper/model-agnostic-private-learning |
Repo | |
Framework | |
Entropy-Based Subword Mining with an Application to Word Embeddings
Title | Entropy-Based Subword Mining with an Application to Word Embeddings |
Authors | Ahmed El-Kishky, Frank Xu, Aston Zhang, Stephen Macke, Jiawei Han |
Abstract | Recent literature has shown a wide variety of benefits to mapping traditional one-hot representations of words and phrases to lower-dimensional real-valued vectors known as word embeddings. Traditionally, most word embedding algorithms treat each word as the finest meaningful semantic granularity and perform embedding by learning distinct embedding vectors for each word. Contrary to this line of thought, technical domains such as scientific and medical literature compose words from subword structures such as prefixes, suffixes, and root-words as well as compound words. Treating individual words as the finest-granularity unit discards meaningful shared semantic structure between words sharing substructures. This not only leads to poor embeddings for text corpora that have long-tail distributions, but also heuristic methods for handling out-of-vocabulary words. In this paper we propose SubwordMine, an entropy-based subword mining algorithm that is fast, unsupervised, and fully data-driven. We show that this allows for great cross-domain performance in identifying semantically meaningful subwords. We then investigate utilizing the mined subwords within the FastText embedding model and compare performance of the learned representations in a downstream language modeling task. |
Tasks | Language Modelling, Machine Translation, Sentiment Analysis, Text Classification, Word Embeddings |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/W18-1202/ |
https://www.aclweb.org/anthology/W18-1202 | |
PWC | https://paperswithcode.com/paper/entropy-based-subword-mining-with-an |
Repo | |
Framework | |
How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures
Title | How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures |
Authors | Tobias Domhan |
Abstract | With recent advances in network architectures for Neural Machine Translation (NMT) recurrent models have effectively been replaced by either convolutional or self-attentional approaches, such as in the Transformer. While the main innovation of the Transformer architecture is its use of self-attentional layers, there are several other aspects, such as attention with multiple heads and the use of many attention layers, that distinguish the model from previous baselines. In this work we take a fine-grained look at the different architectures for NMT. We introduce an Architecture Definition Language (ADL) allowing for a flexible combination of common building blocks. Making use of this language we show in experiments that one can bring recurrent and convolutional models very close to the Transformer performance by borrowing concepts from the Transformer architecture, but not using self-attention. Additionally, we find that self-attention is much more important on the encoder side than on the decoder side, where it can be replaced by a RNN or CNN without a loss in performance in most settings. Surprisingly, even a model without any target side self-attention performs well. |
Tasks | Machine Translation |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/P18-1167/ |
https://www.aclweb.org/anthology/P18-1167 | |
PWC | https://paperswithcode.com/paper/how-much-attention-do-you-need-a-granular |
Repo | |
Framework | |
Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors
Title | Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors |
Authors | Junhyug Noh, Soochan Lee, Beomsu Kim, Gunhee Kim |
Abstract | We propose methods of addressing two critical issues of pedestrian detection: (i) occlusion of target objects as false negative failure, and (ii) confusion with hard negative examples like vertical structures as false positive failure. Our solutions to these two problems are general and flexible enough to be applicable to any single-stage detection models. We implement our methods into four state-of-the-art single-stage models, including SqueezeDet+, YOLOv2, SSD, and DSSD. We empirically validate that our approach indeed improves the performance of those four models on Caltech pedestrian and CityPersons dataset. Moreover, in some heavy occlusion settings, our approach achieves the best reported performance. Specifically, our two solutions are as follows. For better occlusion handling, we update the output tensors of single-stage models so that they include the prediction of part confidence scores, from which we compute a final occlusion-aware detection score. For reducing confusion with hard negative examples, we introduce average grid classifiers as post-refinement classifiers, trainable in an end-to-end fashion with little memory and time overhead (e.g. increase of 1–5 MB in memory and 1–2 ms in inference time). |
Tasks | Pedestrian Detection |
Published | 2018-06-01 |
URL | http://openaccess.thecvf.com/content_cvpr_2018/html/Noh_Improving_Occlusion_and_CVPR_2018_paper.html |
http://openaccess.thecvf.com/content_cvpr_2018/papers/Noh_Improving_Occlusion_and_CVPR_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/improving-occlusion-and-hard-negative |
Repo | |
Framework | |
WILDTRACK: A Multi-Camera HD Dataset for Dense Unscripted Pedestrian Detection
Title | WILDTRACK: A Multi-Camera HD Dataset for Dense Unscripted Pedestrian Detection |
Authors | Tatjana Chavdarova, Pierre Baqué, Stéphane Bouquet, Andrii Maksai, Cijo Jose, Timur Bagautdinov, Louis Lettry, Pascal Fua, Luc Van Gool, François Fleuret |
Abstract | People detection methods are highly sensitive to occlusions between pedestrians, which are extremely frequent in many situations where cameras have to be mounted at a limited height. The reduction of camera prices allows for the generalization of static multi-camera set-ups. Using joint visual information from multiple synchronized cameras gives the opportunity to improve detection performance. In this paper, we present a new large-scale and high-resolution dataset. It has been captured with seven static cameras in a public open area, and unscripted dense groups of pedestrians standing and walking. Together with the camera frames, we provide an accurate joint (extrinsic and intrinsic) calibration, as well as 7 series of 400 annotated frames for detection at a rate of 2 frames per second. This results in over 40,000 bounding boxes delimiting every person present in the area of interest, for a total of more than 300 individuals. We provide a series of benchmark results using baseline algorithms published over the recent months for multi-view detection with deep neural networks, and trajectory estimation using a non-Markovian model. |
Tasks | Calibration, Pedestrian Detection |
Published | 2018-06-01 |
URL | http://openaccess.thecvf.com/content_cvpr_2018/html/Chavdarova_WILDTRACK_A_Multi-Camera_CVPR_2018_paper.html |
http://openaccess.thecvf.com/content_cvpr_2018/papers/Chavdarova_WILDTRACK_A_Multi-Camera_CVPR_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/wildtrack-a-multi-camera-hd-dataset-for-dense |
Repo | |
Framework | |
Exploit the Unknown Gradually: One-Shot Video-Based Person Re-Identification by Stepwise Learning
Title | Exploit the Unknown Gradually: One-Shot Video-Based Person Re-Identification by Stepwise Learning |
Authors | Yu Wu, Yutian Lin, Xuanyi Dong, Yan Yan, Wanli Ouyang, Yi Yang |
Abstract | We focus on the one-shot learning for video-based person re-Identification (re-ID). Unlabeled tracklets for the person re-ID tasks can be easily obtained by pre-processing, such as pedestrian detection and tracking. In this paper, we propose an approach to exploiting unlabeled tracklets by gradually but steadily improving the discriminative capability of the Convolutional Neural Network (CNN) feature representation via stepwise learning. We first initialize a CNN model using one labeled tracklet for each identity. Then we update the CNN model by the following two steps iteratively: 1. sample a few candidates with most reliable pseudo labels from unlabeled tracklets; 2. update the CNN model according to the selected data. Instead of the static sampling strategy applied in existing works, we propose a progressive sampling method to increase the number of the selected pseudo-labeled candidates step by step. We systematically investigate the way how we should select pseudo-labeled tracklets into the training set to make the best use of them. Notably, the rank-1 accuracy of our method outperforms the state-of-the-art method by 21.46 points (absolute, i.e., 62.67% vs. 41.21%) on the MARS dataset, and 16.53 points on the DukeMTMC-VideoReID dataset. |
Tasks | One-Shot Learning, Pedestrian Detection, Person Re-Identification, Video-Based Person Re-Identification |
Published | 2018-06-01 |
URL | http://openaccess.thecvf.com/content_cvpr_2018/html/Wu_Exploit_the_Unknown_CVPR_2018_paper.html |
http://openaccess.thecvf.com/content_cvpr_2018/papers/Wu_Exploit_the_Unknown_CVPR_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/exploit-the-unknown-gradually-one-shot-video |
Repo | |
Framework | |
Label-Free Distant Supervision for Relation Extraction via Knowledge Graph Embedding
Title | Label-Free Distant Supervision for Relation Extraction via Knowledge Graph Embedding |
Authors | Guanying Wang, Wen Zhang, Ruoxu Wang, Yalin Zhou, Xi Chen, Wei Zhang, Hai Zhu, Huajun Chen |
Abstract | Distant supervision is an effective method to generate large scale labeled data for relation extraction, which assumes that if a pair of entities appears in some relation of a Knowledge Graph (KG), all sentences containing those entities in a large unlabeled corpus are then labeled with that relation to train a relation classifier. However, when the pair of entities has multiple relationships in the KG, this assumption may produce noisy relation labels. This paper proposes a label-free distant supervision method, which makes no use of the relation labels under this inadequate assumption, but only uses the prior knowledge derived from the KG to supervise the learning of the classifier directly and softly. Specifically, we make use of the type information and the translation law derived from typical KG embedding model to learn embeddings for certain sentence patterns. As the supervision signal is only determined by the two aligned entities, neither hard relation labels nor extra noise-reduction model for the bag of sentences is needed in this way. The experiments show that the approach performs well in current distant supervision dataset. |
Tasks | Graph Embedding, Knowledge Graph Embedding, Relation Extraction |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/D18-1248/ |
https://www.aclweb.org/anthology/D18-1248 | |
PWC | https://paperswithcode.com/paper/label-free-distant-supervision-for-relation |
Repo | |
Framework | |