Paper Group NAWR 3
Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning. Killing Four Birds with Two Stones: Multi-Task Learning for Non-Literal Language Detection. Variational Approach for Capsule Video Frame Interpolation. Semantic Structure-based Unsupervised Deep Hashing. Baseline: A Library for Rapid Modeling, Experim …
Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning
Title | Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning |
Authors | Piyush Sharma, Nan Ding, Sebastian Goodman, Radu Soricut |
Abstract | We present a new dataset of image caption annotations, Conceptual Captions, which contains an order of magnitude more images than the MS-COCO dataset (Lin et al., 2014) and represents a wider variety of both images and image caption styles. We achieve this by extracting and filtering image caption annotations from billions of webpages. We also present quantitative evaluations of a number of image captioning models and show that a model architecture based on Inception-ResNetv2 (Szegedy et al., 2016) for image-feature extraction and Transformer (Vaswani et al., 2017) for sequence modeling achieves the best performance when trained on the Conceptual Captions dataset. |
Tasks | Image Captioning |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/P18-1238/ |
https://www.aclweb.org/anthology/P18-1238 | |
PWC | https://paperswithcode.com/paper/conceptual-captions-a-cleaned-hypernymed |
Repo | https://github.com/google-research-datasets/conceptual-captions |
Framework | none |
Killing Four Birds with Two Stones: Multi-Task Learning for Non-Literal Language Detection
Title | Killing Four Birds with Two Stones: Multi-Task Learning for Non-Literal Language Detection |
Authors | Erik-L{^a}n Do Dinh, Steffen Eger, Iryna Gurevych |
Abstract | Non-literal language phenomena such as idioms or metaphors are commonly studied in isolation from each other in NLP. However, often similar definitions and features are being used for different phenomena, challenging the distinction. Instead, we propose to view the detection problem as a generalized non-literal language classification problem. In this paper we investigate multi-task learning for related non-literal language phenomena. We show that in contrast to simply joining the data of multiple tasks, multi-task learning consistently improves upon four metaphor and idiom detection tasks in two languages, English and German. Comparing two state-of-the-art multi-task learning architectures, we also investigate when soft parameter sharing and learned information flow can be beneficial for our related tasks. We make our adapted code publicly available. |
Tasks | Multi-Task Learning |
Published | 2018-08-01 |
URL | https://www.aclweb.org/anthology/C18-1132/ |
https://www.aclweb.org/anthology/C18-1132 | |
PWC | https://paperswithcode.com/paper/killing-four-birds-with-two-stones-multi-task |
Repo | https://github.com/UKPLab/coling2018-nonliteral-mtl |
Framework | none |
Variational Approach for Capsule Video Frame Interpolation
Title | Variational Approach for Capsule Video Frame Interpolation |
Authors | Ahmed Mohammed, Ivar Farup, Sule Yildirim, Marius Pedersen, Øistein Hovde |
Abstract | Capsule video endoscopy, which uses a wireless camera to visualize the digestive tract, is emerging as an alternative to traditional colonoscopy. Colonoscopy is considered as the gold standard for visualizing the colon and takes 30 frames per second. Capsule images, on the other hand, are taken with low frame rate (average five frames per second), which makes it difficult to find pathology and results in eye fatigue for viewing. In this paper, we propose a variational algorithm to smooth the video temporally and create a visually pleasant video. The main objective of the paper is to increase the frame rate to be closer to that of the colonoscopy. We propose variational energy that takes into consideration both motion estimation and intermediate frame intensity interpolation using the surrounding frames. The proposed formulation incorporates both pixel intensity and texture feature in the optical flow objective function such that the interpolation at the intermediate frame is directly modeled. The main feature of this formulation is that error in motion estimation is incorporated in our model, so that only robust motion estimation are used in estimating the intensity of the intermediate frame. We derived Euler-Lagrange equations and showed an efficient numerical scheme that can be implemented on graphics hardware. Finally, a motion compensated frame rate doubling version of our method is implemented. We evaluate the quality of both 90 and 100% of the frames for medical diagnosis domain through objective image quality metrics. Our method improves state-of-the-art result for 90% frames while performing equivalent for the remaining cases with other existing methods. In the last section, we show application of frame interpolation to informative frame segment visualization and to reduce the power consumption. |
Tasks | Medical Diagnosis, Motion Estimation, Optical Flow Estimation, Video Frame Interpolation |
Published | 2018-11-06 |
URL | https://jivp-eurasipjournals.springeropen.com/articles/10.1186/s13640-018-0267-9 |
https://jivp-eurasipjournals.springeropen.com/articles/10.1186/s13640-018-0267-9 | |
PWC | https://paperswithcode.com/paper/variational-approach-for-capsule-video-frame |
Repo | https://github.com/ahme0307/TSR |
Framework | none |
Semantic Structure-based Unsupervised Deep Hashing
Title | Semantic Structure-based Unsupervised Deep Hashing |
Authors | Erkun Yang, Cheng Deng, Tongliang Liu, Wei Liu, Dacheng Tao |
Abstract | Hashing is becoming increasingly popular for approximate nearest neighbor searching in massive databases due to its storage and search efficiency. Recent supervised hashing methods, which usually construct semantic similarity matrices to guide hash code learning using label information, have shown promising results. However, it is relatively difficult to capture and utilize the semantic relationships between points in unsupervised settings. To address this problem, we propose a novel unsupervised deep framework called Semantic Structure-based unsupervised Deep Hashing (SSDH). We first empirically study the deep feature statistics, and find that the distribution of the cosine distance for point pairs can be estimated by two half Gaussian distributions. Based on this observation, we construct the semantic structure by considering points with distances obviously smaller than the others as semantically similar and points with distances obviously larger than the others as semantically dissimilar. We then design a deep architecture and a pair-wise loss function to preserve this semantic structure in Hamming space. Extensive experiments show that SSDH significantly outperforms current state-of-the-art methods. |
Tasks | Semantic Similarity, Semantic Textual Similarity |
Published | 2018-05-01 |
URL | https://www.researchgate.net/publication/326206331_Semantic_Structure-based_Unsupervised_Deep_Hashing |
https://www.ijcai.org/proceedings/2018/0148.pdf | |
PWC | https://paperswithcode.com/paper/semantic-structure-based-unsupervised-deep |
Repo | https://github.com/yangerkun/IJCAI2018_SSDH |
Framework | tf |
Baseline: A Library for Rapid Modeling, Experimentation and Development of Deep Learning Algorithms targeting NLP
Title | Baseline: A Library for Rapid Modeling, Experimentation and Development of Deep Learning Algorithms targeting NLP |
Authors | Daniel Pressel, Sagnik Ray Choudhury, Brian Lester, Yanjie Zhao, Matt Barta |
Abstract | We introduce Baseline: a library for reproducible deep learning research and fast model development for NLP. The library provides easily extensible abstractions and implementations for data loading, model development, training and export of deep learning architectures. It also provides implementations for simple, high-performance, deep learning models for various NLP tasks, against which newly developed models can be compared. Deep learning experiments are hard to reproduce, Baseline provides functionalities to track them. The goal is to allow a researcher to focus on model development, delegating the repetitive tasks to the library. |
Tasks | Language Modelling, Machine Translation, Named Entity Recognition, Part-Of-Speech Tagging, Slot Filling |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/W18-2506/ |
https://www.aclweb.org/anthology/W18-2506 | |
PWC | https://paperswithcode.com/paper/baseline-a-library-for-rapid-modeling |
Repo | https://github.com/dpressel/baseline |
Framework | tf |
A Deeply-initialized Coarse-to-fine Ensemble of Regression Trees for Face Alignment
Title | A Deeply-initialized Coarse-to-fine Ensemble of Regression Trees for Face Alignment |
Authors | Roberto Valle, Jose M. Buenaposada, Antonio Valdes, Luis Baumela |
Abstract | In this paper we present DCFE, a real-time facial landmark regression method based on a coarse-to-fine Ensemble of Regression Trees (ERT). We use a simple Convolutional Neural Network (CNN) to generate probability maps of landmarks location. These are further refined with the ERT regressor, which is initialized by fitting a 3D face model to the landmark maps. The coarse-to-fine structure of the ERT lets us address the combinatorial explosion of parts deformation. With the 3D model we also tackle other key problems such as robust regressor initialization, self occlusions, and simultaneous frontal and profile face analysis. In the experiments DCFE achieves the best reported result in AFLW, COFW, and 300W private and common public data sets. |
Tasks | Face Alignment, Facial Landmark Detection |
Published | 2018-09-01 |
URL | http://openaccess.thecvf.com/content_ECCV_2018/html/Roberto_Valle_A_Deeply-initialized_Coarse-to-fine_ECCV_2018_paper.html |
http://openaccess.thecvf.com/content_ECCV_2018/papers/Roberto_Valle_A_Deeply-initialized_Coarse-to-fine_ECCV_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/a-deeply-initialized-coarse-to-fine-ensemble |
Repo | https://github.com/bobetocalo/bobetocalo_eccv18 |
Framework | none |
Rumor Detection on Twitter with Tree-structured Recursive Neural Networks
Title | Rumor Detection on Twitter with Tree-structured Recursive Neural Networks |
Authors | Jing Ma, Wei Gao, Kam-Fai Wong |
Abstract | Automatic rumor detection is technically very challenging. In this work, we try to learn discriminative features from tweets content by following their non-sequential propagation structure and generate more powerful representations for identifying different type of rumors. We propose two recursive neural models based on a bottom-up and a top-down tree-structured neural networks for rumor representation learning and classification, which naturally conform to the propagation layout of tweets. Results on two public Twitter datasets demonstrate that our recursive neural models 1) achieve much better performance than state-of-the-art approaches; 2) demonstrate superior capacity on detecting rumors at very early stage. |
Tasks | Feature Engineering, Representation Learning |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/P18-1184/ |
https://www.aclweb.org/anthology/P18-1184 | |
PWC | https://paperswithcode.com/paper/rumor-detection-on-twitter-with-tree |
Repo | https://github.com/majingCUHK/Rumor_RvNN |
Framework | none |
Representation Learning of Compositional Data
Title | Representation Learning of Compositional Data |
Authors | Marta Avalos, Richard Nock, Cheng Soon Ong, Julien Rouar, Ke Sun |
Abstract | We consider the problem of learning a low dimensional representation for compositional data. Compositional data consists of a collection of nonnegative data that sum to a constant value. Since the parts of the collection are statistically dependent, many standard tools cannot be directly applied. Instead, compositional data must be first transformed before analysis. Focusing on principal component analysis (PCA), we propose an approach that allows low dimensional representation learning directly from the original data. Our approach combines the benefits of the log-ratio transformation from compositional data analysis and exponential family PCA. A key tool in its derivation is a generalization of the scaled Bregman theorem, that relates the perspective transform of a Bregman divergence to the Bregman divergence of a perspective transform and a remainder conformal divergence. Our proposed approach includes a convenient surrogate (upper bound) loss of the exponential family PCA which has an easy to optimize form. We also derive the corresponding form for nonlinear autoencoders. Experiments on simulated data and microbiome data show the promise of our method. |
Tasks | Representation Learning |
Published | 2018-12-01 |
URL | http://papers.nips.cc/paper/7902-representation-learning-of-compositional-data |
http://papers.nips.cc/paper/7902-representation-learning-of-compositional-data.pdf | |
PWC | https://paperswithcode.com/paper/representation-learning-of-compositional-data |
Repo | https://github.com/sistm/CoDa-PCA |
Framework | none |
Learning Target-Specific Representations of Financial News Documents For Cumulative Abnormal Return Prediction
Title | Learning Target-Specific Representations of Financial News Documents For Cumulative Abnormal Return Prediction |
Authors | Junwen Duan, Yue Zhang, Xiao Ding, Ching-Yun Chang, Ting Liu |
Abstract | Texts from the Internet serve as important data sources for financial market modeling. Early statistical approaches rely on manually defined features to capture lexical, sentiment and event information, which suffers from feature sparsity. Recent work has considered learning dense representations for news titles and abstracts. Compared to news titles, full documents can contain more potentially helpful information, but also noise compared to events and sentences, which has been less investigated in previous work. To fill this gap, we propose a novel target-specific abstract-guided news document representation model. The model uses a target-sensitive representation of the news abstract to weigh sentences in the news content, so as to select and combine the most informative sentences for market modeling. Results show that document representations can give better performance for estimating cumulative abnormal returns of companies when compared to titles and abstracts. Our model is especially effective when it used to combine information from multiple document sources compared to the sentence-level baselines. |
Tasks | Information Retrieval, Stock Market Prediction |
Published | 2018-08-01 |
URL | https://www.aclweb.org/anthology/C18-1239/ |
https://www.aclweb.org/anthology/C18-1239 | |
PWC | https://paperswithcode.com/paper/learning-target-specific-representations-of |
Repo | https://github.com/sudy/coling2018 |
Framework | pytorch |
Enriching the WebNLG corpus
Title | Enriching the WebNLG corpus |
Authors | Thiago Castro Ferreira, Diego Moussallem, Emiel Krahmer, S Wubben, er |
Abstract | This paper describes the enrichment of WebNLG corpus (Gardent et al., 2017a,b), with the aim to further extend its usefulness as a resource for evaluating common NLG tasks, including Discourse Ordering, Lexicalization and Referring Expression Generation. We also produce a silver-standard German translation of the corpus to enable the exploitation of NLG approaches to other languages than English. The enriched corpus is publicly available. |
Tasks | Machine Translation, Text Generation |
Published | 2018-11-01 |
URL | https://www.aclweb.org/anthology/W18-6521/ |
https://www.aclweb.org/anthology/W18-6521 | |
PWC | https://paperswithcode.com/paper/enriching-the-webnlg-corpus |
Repo | https://github.com/ThiagoCF05/webnlg |
Framework | none |
Syntactic Manipulation for Generating more Diverse and Interesting Texts
Title | Syntactic Manipulation for Generating more Diverse and Interesting Texts |
Authors | Jan Milan Deriu, Mark Cieliebak |
Abstract | Natural Language Generation plays an important role in the domain of dialogue systems as it determines how users perceive the system. Recently, deep-learning based systems have been proposed to tackle this task, as they generalize better and require less amounts of manual effort to implement them for new domains. However, deep learning systems usually adapt a very homogeneous sounding writing style which expresses little variation. In this work, we present our system for Natural Language Generation where we control various aspects of the surface realization in order to increase the lexical variability of the utterances, such that they sound more diverse and interesting. For this, we use a Semantically Controlled Long Short-term Memory Network (SC-LSTM), and apply its specialized cell to control various syntactic features of the generated texts. We present an in-depth human evaluation where we show the effects of these surface manipulation on the perception of potential users. |
Tasks | Text Generation |
Published | 2018-11-01 |
URL | https://www.aclweb.org/anthology/W18-6503/ |
https://www.aclweb.org/anthology/W18-6503 | |
PWC | https://paperswithcode.com/paper/syntactic-manipulation-for-generating-more |
Repo | https://github.com/jderiu/e2e_nlg |
Framework | tf |
A PID Controller Approach for Stochastic Optimization of Deep Networks
Title | A PID Controller Approach for Stochastic Optimization of Deep Networks |
Authors | Wangpeng An, Haoqian Wang, Qingyun Sun, Jun Xu, Qionghai Dai, Lei Zhang |
Abstract | Deep neural networks have demonstrated their power in many computer vision applications. State-of-the-art deep architectures such as VGG, ResNet, and DenseNet are mostly optimized by the SGD-Momentum algorithm, which updates the weights by considering their past and current gradients. Nonetheless, SGD-Momentum suffers from the overshoot problem, which hinders the convergence of network training. Inspired by the prominent success of proportional-integral-derivative (PID) controller in automatic control, we propose a PID approach for accelerating deep network optimization. We first reveal the intrinsic connections between SGD-Momentum and PID based controller, then present the optimization algorithm which exploits the past, current, and change of gradients to update the network parameters. The proposed PID method reduces much the overshoot phenomena of SGD-Momentum, and it achieves up to 50% acceleration on popular deep network architectures with competitive accuracy, as verified by our experiments on the benchmark datasets including CIFAR10, CIFAR100, and Tiny-ImageNet. |
Tasks | Stochastic Optimization |
Published | 2018-06-01 |
URL | http://openaccess.thecvf.com/content_cvpr_2018/html/An_A_PID_Controller_CVPR_2018_paper.html |
http://openaccess.thecvf.com/content_cvpr_2018/papers/An_A_PID_Controller_CVPR_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/a-pid-controller-approach-for-stochastic |
Repo | https://github.com/jettify/pytorch-optimizer |
Framework | pytorch |
Unsupervised Morphology Learning with Statistical Paradigms
Title | Unsupervised Morphology Learning with Statistical Paradigms |
Authors | Hongzhi Xu, Mitchell Marcus, Charles Yang, Lyle Ungar |
Abstract | This paper describes an unsupervised model for morphological segmentation that exploits the notion of paradigms, which are sets of morphological categories (e.g., suffixes) that can be applied to a homogeneous set of words (e.g., nouns or verbs). Our algorithm identifies statistically reliable paradigms from the morphological segmentation result of a probabilistic model, and chooses reliable suffixes from them. The new suffixes can be fed back iteratively to improve the accuracy of the probabilistic model. Finally, the unreliable paradigms are subjected to pruning to eliminate unreliable morphological relations between words. The paradigm-based algorithm significantly improves segmentation accuracy. Our method achieves start-of-the-art results on experiments using the Morpho-Challenge data, including English, Turkish, and Finnish. |
Tasks | Information Retrieval, Text Generation |
Published | 2018-08-01 |
URL | https://www.aclweb.org/anthology/C18-1005/ |
https://www.aclweb.org/anthology/C18-1005 | |
PWC | https://paperswithcode.com/paper/unsupervised-morphology-learning-with |
Repo | https://github.com/xuhongzhi/ParaMA |
Framework | none |
Multiple Instance Learning for Efficient Sequential Data Classification on Resource-constrained Devices
Title | Multiple Instance Learning for Efficient Sequential Data Classification on Resource-constrained Devices |
Authors | Don Dennis, Chirag Pabbaraju, Harsha Vardhan Simhadri, Prateek Jain |
Abstract | We study the problem of fast and efficient classification of sequential data (such as time-series) on tiny devices, which is critical for various IoT related applications like audio keyword detection or gesture detection. Such tasks are cast as a standard classification task by sliding windows over the data stream to construct data points. Deploying such classification modules on tiny devices is challenging as predictions over sliding windows of data need to be invoked continuously at a high frequency. Each such predictor instance in itself is expensive as it evaluates large models over long windows of data. In this paper, we address this challenge by exploiting the following two observations about classification tasks arising in typical IoT related applications: (a) the “signature” of a particular class (e.g. an audio keyword) typically occupies a small fraction of the overall data, and (b) class signatures tend to be discernible early on in the data. We propose a method, EMI-RNN, that exploits these observations by using a multiple instance learning formulation along with an early prediction technique to learn a model that achieves better accuracy compared to baseline models, while simultaneously reducing computation by a large fraction. For instance, on a gesture detection benchmark [ 25 ], EMI-RNN improves standard LSTM model’s accuracy by up to 1% while requiring 72x less computation. This enables us to deploy such models for continuous real-time prediction on a small device such as Raspberry Pi0 and Arduino variants, a task that the baseline LSTM could not achieve. Finally, we also provide an analysis of our multiple instance learning algorithm in a simple setting and show that the proposed algorithm converges to the global optima at a linear rate, one of the first such result in this domain. The code for EMI-RNN is available at: https://github.com/Microsoft/EdgeML/tree/master/tf/examples/EMI-RNN |
Tasks | Multiple Instance Learning, Time Series, Time Series Classification |
Published | 2018-12-01 |
URL | http://papers.nips.cc/paper/8292-multiple-instance-learning-for-efficient-sequential-data-classification-on-resource-constrained-devices |
http://papers.nips.cc/paper/8292-multiple-instance-learning-for-efficient-sequential-data-classification-on-resource-constrained-devices.pdf | |
PWC | https://paperswithcode.com/paper/multiple-instance-learning-for-efficient |
Repo | https://github.com/Microsoft/EdgeML |
Framework | tf |
Out-of-Distribution Detection Using an Ensemble of Self Supervised Leave-out Classifiers
Title | Out-of-Distribution Detection Using an Ensemble of Self Supervised Leave-out Classifiers |
Authors | Apoorv Vyas, Nataraj Jammalamadaka, Xia Zhu, Dipankar Das, Bharat Kaul, Theodore L. Willke |
Abstract | As deep learning methods form a critical part in commercially important applications such as autonomous driving and medical diagnostics, it is important to reliably detect out-of-distribution (OOD) inputs while employing these algorithms. In this work, we propose an OOD detection algorithm which comprises of an ensemble of classifiers. We train each classifier in a self-supervised manner by leaving out a random subset of training data as OOD data and the rest as in-distribution (ID) data. We propose a novel margin-based loss over the softmax output which seeks to maintain at least a margin m between the average entropy of the OOD and in-distribution samples. In conjunction with the standard cross-entropy loss, we minimize the novel loss to train an ensemble of classifiers. We also propose a novel method to combine the outputs of the ensemble of classifiers to obtain OOD detection score and class prediction. Overall, our method convincingly outperforms Hendrycks et al. [7] and the current state-of-the-art ODIN [13] on several OOD detection benchmarks. |
Tasks | Autonomous Driving, Out-of-Distribution Detection |
Published | 2018-09-01 |
URL | http://openaccess.thecvf.com/content_ECCV_2018/html/Apoorv_Vyas_Out-of-Distribution_Detection_Using_ECCV_2018_paper.html |
http://openaccess.thecvf.com/content_ECCV_2018/papers/Apoorv_Vyas_Out-of-Distribution_Detection_Using_ECCV_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/out-of-distribution-detection-using-an |
Repo | https://github.com/YU1ut/Ensemble-of-Leave-out-Classifiers |
Framework | pytorch |