October 16, 2019

3004 words 15 mins read

Paper Group NAWR 3

Paper Group NAWR 3

Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning. Killing Four Birds with Two Stones: Multi-Task Learning for Non-Literal Language Detection. Variational Approach for Capsule Video Frame Interpolation. Semantic Structure-based Unsupervised Deep Hashing. Baseline: A Library for Rapid Modeling, Experim …

Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning

Title Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning
Authors Piyush Sharma, Nan Ding, Sebastian Goodman, Radu Soricut
Abstract We present a new dataset of image caption annotations, Conceptual Captions, which contains an order of magnitude more images than the MS-COCO dataset (Lin et al., 2014) and represents a wider variety of both images and image caption styles. We achieve this by extracting and filtering image caption annotations from billions of webpages. We also present quantitative evaluations of a number of image captioning models and show that a model architecture based on Inception-ResNetv2 (Szegedy et al., 2016) for image-feature extraction and Transformer (Vaswani et al., 2017) for sequence modeling achieves the best performance when trained on the Conceptual Captions dataset.
Tasks Image Captioning
Published 2018-07-01
URL https://www.aclweb.org/anthology/P18-1238/
PDF https://www.aclweb.org/anthology/P18-1238
PWC https://paperswithcode.com/paper/conceptual-captions-a-cleaned-hypernymed
Repo https://github.com/google-research-datasets/conceptual-captions
Framework none

Killing Four Birds with Two Stones: Multi-Task Learning for Non-Literal Language Detection

Title Killing Four Birds with Two Stones: Multi-Task Learning for Non-Literal Language Detection
Authors Erik-L{^a}n Do Dinh, Steffen Eger, Iryna Gurevych
Abstract Non-literal language phenomena such as idioms or metaphors are commonly studied in isolation from each other in NLP. However, often similar definitions and features are being used for different phenomena, challenging the distinction. Instead, we propose to view the detection problem as a generalized non-literal language classification problem. In this paper we investigate multi-task learning for related non-literal language phenomena. We show that in contrast to simply joining the data of multiple tasks, multi-task learning consistently improves upon four metaphor and idiom detection tasks in two languages, English and German. Comparing two state-of-the-art multi-task learning architectures, we also investigate when soft parameter sharing and learned information flow can be beneficial for our related tasks. We make our adapted code publicly available.
Tasks Multi-Task Learning
Published 2018-08-01
URL https://www.aclweb.org/anthology/C18-1132/
PDF https://www.aclweb.org/anthology/C18-1132
PWC https://paperswithcode.com/paper/killing-four-birds-with-two-stones-multi-task
Repo https://github.com/UKPLab/coling2018-nonliteral-mtl
Framework none

Variational Approach for Capsule Video Frame Interpolation

Title Variational Approach for Capsule Video Frame Interpolation
Authors Ahmed Mohammed, Ivar Farup, Sule Yildirim, Marius Pedersen, Øistein Hovde
Abstract Capsule video endoscopy, which uses a wireless camera to visualize the digestive tract, is emerging as an alternative to traditional colonoscopy. Colonoscopy is considered as the gold standard for visualizing the colon and takes 30 frames per second. Capsule images, on the other hand, are taken with low frame rate (average five frames per second), which makes it difficult to find pathology and results in eye fatigue for viewing. In this paper, we propose a variational algorithm to smooth the video temporally and create a visually pleasant video. The main objective of the paper is to increase the frame rate to be closer to that of the colonoscopy. We propose variational energy that takes into consideration both motion estimation and intermediate frame intensity interpolation using the surrounding frames. The proposed formulation incorporates both pixel intensity and texture feature in the optical flow objective function such that the interpolation at the intermediate frame is directly modeled. The main feature of this formulation is that error in motion estimation is incorporated in our model, so that only robust motion estimation are used in estimating the intensity of the intermediate frame. We derived Euler-Lagrange equations and showed an efficient numerical scheme that can be implemented on graphics hardware. Finally, a motion compensated frame rate doubling version of our method is implemented. We evaluate the quality of both 90 and 100% of the frames for medical diagnosis domain through objective image quality metrics. Our method improves state-of-the-art result for 90% frames while performing equivalent for the remaining cases with other existing methods. In the last section, we show application of frame interpolation to informative frame segment visualization and to reduce the power consumption.
Tasks Medical Diagnosis, Motion Estimation, Optical Flow Estimation, Video Frame Interpolation
Published 2018-11-06
URL https://jivp-eurasipjournals.springeropen.com/articles/10.1186/s13640-018-0267-9
PDF https://jivp-eurasipjournals.springeropen.com/articles/10.1186/s13640-018-0267-9
PWC https://paperswithcode.com/paper/variational-approach-for-capsule-video-frame
Repo https://github.com/ahme0307/TSR
Framework none

Semantic Structure-based Unsupervised Deep Hashing

Title Semantic Structure-based Unsupervised Deep Hashing
Authors Erkun Yang, Cheng Deng, Tongliang Liu, Wei Liu, Dacheng Tao
Abstract Hashing is becoming increasingly popular for approximate nearest neighbor searching in massive databases due to its storage and search efficiency. Recent supervised hashing methods, which usually construct semantic similarity matrices to guide hash code learning using label information, have shown promising results. However, it is relatively difficult to capture and utilize the semantic relationships between points in unsupervised settings. To address this problem, we propose a novel unsupervised deep framework called Semantic Structure-based unsupervised Deep Hashing (SSDH). We first empirically study the deep feature statistics, and find that the distribution of the cosine distance for point pairs can be estimated by two half Gaussian distributions. Based on this observation, we construct the semantic structure by considering points with distances obviously smaller than the others as semantically similar and points with distances obviously larger than the others as semantically dissimilar. We then design a deep architecture and a pair-wise loss function to preserve this semantic structure in Hamming space. Extensive experiments show that SSDH significantly outperforms current state-of-the-art methods.
Tasks Semantic Similarity, Semantic Textual Similarity
Published 2018-05-01
URL https://www.researchgate.net/publication/326206331_Semantic_Structure-based_Unsupervised_Deep_Hashing
PDF https://www.ijcai.org/proceedings/2018/0148.pdf
PWC https://paperswithcode.com/paper/semantic-structure-based-unsupervised-deep
Repo https://github.com/yangerkun/IJCAI2018_SSDH
Framework tf

Baseline: A Library for Rapid Modeling, Experimentation and Development of Deep Learning Algorithms targeting NLP

Title Baseline: A Library for Rapid Modeling, Experimentation and Development of Deep Learning Algorithms targeting NLP
Authors Daniel Pressel, Sagnik Ray Choudhury, Brian Lester, Yanjie Zhao, Matt Barta
Abstract We introduce Baseline: a library for reproducible deep learning research and fast model development for NLP. The library provides easily extensible abstractions and implementations for data loading, model development, training and export of deep learning architectures. It also provides implementations for simple, high-performance, deep learning models for various NLP tasks, against which newly developed models can be compared. Deep learning experiments are hard to reproduce, Baseline provides functionalities to track them. The goal is to allow a researcher to focus on model development, delegating the repetitive tasks to the library.
Tasks Language Modelling, Machine Translation, Named Entity Recognition, Part-Of-Speech Tagging, Slot Filling
Published 2018-07-01
URL https://www.aclweb.org/anthology/W18-2506/
PDF https://www.aclweb.org/anthology/W18-2506
PWC https://paperswithcode.com/paper/baseline-a-library-for-rapid-modeling
Repo https://github.com/dpressel/baseline
Framework tf

A Deeply-initialized Coarse-to-fine Ensemble of Regression Trees for Face Alignment

Title A Deeply-initialized Coarse-to-fine Ensemble of Regression Trees for Face Alignment
Authors Roberto Valle, Jose M. Buenaposada, Antonio Valdes, Luis Baumela
Abstract In this paper we present DCFE, a real-time facial landmark regression method based on a coarse-to-fine Ensemble of Regression Trees (ERT). We use a simple Convolutional Neural Network (CNN) to generate probability maps of landmarks location. These are further refined with the ERT regressor, which is initialized by fitting a 3D face model to the landmark maps. The coarse-to-fine structure of the ERT lets us address the combinatorial explosion of parts deformation. With the 3D model we also tackle other key problems such as robust regressor initialization, self occlusions, and simultaneous frontal and profile face analysis. In the experiments DCFE achieves the best reported result in AFLW, COFW, and 300W private and common public data sets.
Tasks Face Alignment, Facial Landmark Detection
Published 2018-09-01
URL http://openaccess.thecvf.com/content_ECCV_2018/html/Roberto_Valle_A_Deeply-initialized_Coarse-to-fine_ECCV_2018_paper.html
PDF http://openaccess.thecvf.com/content_ECCV_2018/papers/Roberto_Valle_A_Deeply-initialized_Coarse-to-fine_ECCV_2018_paper.pdf
PWC https://paperswithcode.com/paper/a-deeply-initialized-coarse-to-fine-ensemble
Repo https://github.com/bobetocalo/bobetocalo_eccv18
Framework none

Rumor Detection on Twitter with Tree-structured Recursive Neural Networks

Title Rumor Detection on Twitter with Tree-structured Recursive Neural Networks
Authors Jing Ma, Wei Gao, Kam-Fai Wong
Abstract Automatic rumor detection is technically very challenging. In this work, we try to learn discriminative features from tweets content by following their non-sequential propagation structure and generate more powerful representations for identifying different type of rumors. We propose two recursive neural models based on a bottom-up and a top-down tree-structured neural networks for rumor representation learning and classification, which naturally conform to the propagation layout of tweets. Results on two public Twitter datasets demonstrate that our recursive neural models 1) achieve much better performance than state-of-the-art approaches; 2) demonstrate superior capacity on detecting rumors at very early stage.
Tasks Feature Engineering, Representation Learning
Published 2018-07-01
URL https://www.aclweb.org/anthology/P18-1184/
PDF https://www.aclweb.org/anthology/P18-1184
PWC https://paperswithcode.com/paper/rumor-detection-on-twitter-with-tree
Repo https://github.com/majingCUHK/Rumor_RvNN
Framework none

Representation Learning of Compositional Data

Title Representation Learning of Compositional Data
Authors Marta Avalos, Richard Nock, Cheng Soon Ong, Julien Rouar, Ke Sun
Abstract We consider the problem of learning a low dimensional representation for compositional data. Compositional data consists of a collection of nonnegative data that sum to a constant value. Since the parts of the collection are statistically dependent, many standard tools cannot be directly applied. Instead, compositional data must be first transformed before analysis. Focusing on principal component analysis (PCA), we propose an approach that allows low dimensional representation learning directly from the original data. Our approach combines the benefits of the log-ratio transformation from compositional data analysis and exponential family PCA. A key tool in its derivation is a generalization of the scaled Bregman theorem, that relates the perspective transform of a Bregman divergence to the Bregman divergence of a perspective transform and a remainder conformal divergence. Our proposed approach includes a convenient surrogate (upper bound) loss of the exponential family PCA which has an easy to optimize form. We also derive the corresponding form for nonlinear autoencoders. Experiments on simulated data and microbiome data show the promise of our method.
Tasks Representation Learning
Published 2018-12-01
URL http://papers.nips.cc/paper/7902-representation-learning-of-compositional-data
PDF http://papers.nips.cc/paper/7902-representation-learning-of-compositional-data.pdf
PWC https://paperswithcode.com/paper/representation-learning-of-compositional-data
Repo https://github.com/sistm/CoDa-PCA
Framework none

Learning Target-Specific Representations of Financial News Documents For Cumulative Abnormal Return Prediction

Title Learning Target-Specific Representations of Financial News Documents For Cumulative Abnormal Return Prediction
Authors Junwen Duan, Yue Zhang, Xiao Ding, Ching-Yun Chang, Ting Liu
Abstract Texts from the Internet serve as important data sources for financial market modeling. Early statistical approaches rely on manually defined features to capture lexical, sentiment and event information, which suffers from feature sparsity. Recent work has considered learning dense representations for news titles and abstracts. Compared to news titles, full documents can contain more potentially helpful information, but also noise compared to events and sentences, which has been less investigated in previous work. To fill this gap, we propose a novel target-specific abstract-guided news document representation model. The model uses a target-sensitive representation of the news abstract to weigh sentences in the news content, so as to select and combine the most informative sentences for market modeling. Results show that document representations can give better performance for estimating cumulative abnormal returns of companies when compared to titles and abstracts. Our model is especially effective when it used to combine information from multiple document sources compared to the sentence-level baselines.
Tasks Information Retrieval, Stock Market Prediction
Published 2018-08-01
URL https://www.aclweb.org/anthology/C18-1239/
PDF https://www.aclweb.org/anthology/C18-1239
PWC https://paperswithcode.com/paper/learning-target-specific-representations-of
Repo https://github.com/sudy/coling2018
Framework pytorch

Enriching the WebNLG corpus

Title Enriching the WebNLG corpus
Authors Thiago Castro Ferreira, Diego Moussallem, Emiel Krahmer, S Wubben, er
Abstract This paper describes the enrichment of WebNLG corpus (Gardent et al., 2017a,b), with the aim to further extend its usefulness as a resource for evaluating common NLG tasks, including Discourse Ordering, Lexicalization and Referring Expression Generation. We also produce a silver-standard German translation of the corpus to enable the exploitation of NLG approaches to other languages than English. The enriched corpus is publicly available.
Tasks Machine Translation, Text Generation
Published 2018-11-01
URL https://www.aclweb.org/anthology/W18-6521/
PDF https://www.aclweb.org/anthology/W18-6521
PWC https://paperswithcode.com/paper/enriching-the-webnlg-corpus
Repo https://github.com/ThiagoCF05/webnlg
Framework none

Syntactic Manipulation for Generating more Diverse and Interesting Texts

Title Syntactic Manipulation for Generating more Diverse and Interesting Texts
Authors Jan Milan Deriu, Mark Cieliebak
Abstract Natural Language Generation plays an important role in the domain of dialogue systems as it determines how users perceive the system. Recently, deep-learning based systems have been proposed to tackle this task, as they generalize better and require less amounts of manual effort to implement them for new domains. However, deep learning systems usually adapt a very homogeneous sounding writing style which expresses little variation. In this work, we present our system for Natural Language Generation where we control various aspects of the surface realization in order to increase the lexical variability of the utterances, such that they sound more diverse and interesting. For this, we use a Semantically Controlled Long Short-term Memory Network (SC-LSTM), and apply its specialized cell to control various syntactic features of the generated texts. We present an in-depth human evaluation where we show the effects of these surface manipulation on the perception of potential users.
Tasks Text Generation
Published 2018-11-01
URL https://www.aclweb.org/anthology/W18-6503/
PDF https://www.aclweb.org/anthology/W18-6503
PWC https://paperswithcode.com/paper/syntactic-manipulation-for-generating-more
Repo https://github.com/jderiu/e2e_nlg
Framework tf

A PID Controller Approach for Stochastic Optimization of Deep Networks

Title A PID Controller Approach for Stochastic Optimization of Deep Networks
Authors Wangpeng An, Haoqian Wang, Qingyun Sun, Jun Xu, Qionghai Dai, Lei Zhang
Abstract Deep neural networks have demonstrated their power in many computer vision applications. State-of-the-art deep architectures such as VGG, ResNet, and DenseNet are mostly optimized by the SGD-Momentum algorithm, which updates the weights by considering their past and current gradients. Nonetheless, SGD-Momentum suffers from the overshoot problem, which hinders the convergence of network training. Inspired by the prominent success of proportional-integral-derivative (PID) controller in automatic control, we propose a PID approach for accelerating deep network optimization. We first reveal the intrinsic connections between SGD-Momentum and PID based controller, then present the optimization algorithm which exploits the past, current, and change of gradients to update the network parameters. The proposed PID method reduces much the overshoot phenomena of SGD-Momentum, and it achieves up to 50% acceleration on popular deep network architectures with competitive accuracy, as verified by our experiments on the benchmark datasets including CIFAR10, CIFAR100, and Tiny-ImageNet.
Tasks Stochastic Optimization
Published 2018-06-01
URL http://openaccess.thecvf.com/content_cvpr_2018/html/An_A_PID_Controller_CVPR_2018_paper.html
PDF http://openaccess.thecvf.com/content_cvpr_2018/papers/An_A_PID_Controller_CVPR_2018_paper.pdf
PWC https://paperswithcode.com/paper/a-pid-controller-approach-for-stochastic
Repo https://github.com/jettify/pytorch-optimizer
Framework pytorch

Unsupervised Morphology Learning with Statistical Paradigms

Title Unsupervised Morphology Learning with Statistical Paradigms
Authors Hongzhi Xu, Mitchell Marcus, Charles Yang, Lyle Ungar
Abstract This paper describes an unsupervised model for morphological segmentation that exploits the notion of paradigms, which are sets of morphological categories (e.g., suffixes) that can be applied to a homogeneous set of words (e.g., nouns or verbs). Our algorithm identifies statistically reliable paradigms from the morphological segmentation result of a probabilistic model, and chooses reliable suffixes from them. The new suffixes can be fed back iteratively to improve the accuracy of the probabilistic model. Finally, the unreliable paradigms are subjected to pruning to eliminate unreliable morphological relations between words. The paradigm-based algorithm significantly improves segmentation accuracy. Our method achieves start-of-the-art results on experiments using the Morpho-Challenge data, including English, Turkish, and Finnish.
Tasks Information Retrieval, Text Generation
Published 2018-08-01
URL https://www.aclweb.org/anthology/C18-1005/
PDF https://www.aclweb.org/anthology/C18-1005
PWC https://paperswithcode.com/paper/unsupervised-morphology-learning-with
Repo https://github.com/xuhongzhi/ParaMA
Framework none

Multiple Instance Learning for Efficient Sequential Data Classification on Resource-constrained Devices

Title Multiple Instance Learning for Efficient Sequential Data Classification on Resource-constrained Devices
Authors Don Dennis, Chirag Pabbaraju, Harsha Vardhan Simhadri, Prateek Jain
Abstract We study the problem of fast and efficient classification of sequential data (such as time-series) on tiny devices, which is critical for various IoT related applications like audio keyword detection or gesture detection. Such tasks are cast as a standard classification task by sliding windows over the data stream to construct data points. Deploying such classification modules on tiny devices is challenging as predictions over sliding windows of data need to be invoked continuously at a high frequency. Each such predictor instance in itself is expensive as it evaluates large models over long windows of data. In this paper, we address this challenge by exploiting the following two observations about classification tasks arising in typical IoT related applications: (a) the “signature” of a particular class (e.g. an audio keyword) typically occupies a small fraction of the overall data, and (b) class signatures tend to be discernible early on in the data. We propose a method, EMI-RNN, that exploits these observations by using a multiple instance learning formulation along with an early prediction technique to learn a model that achieves better accuracy compared to baseline models, while simultaneously reducing computation by a large fraction. For instance, on a gesture detection benchmark [ 25 ], EMI-RNN improves standard LSTM model’s accuracy by up to 1% while requiring 72x less computation. This enables us to deploy such models for continuous real-time prediction on a small device such as Raspberry Pi0 and Arduino variants, a task that the baseline LSTM could not achieve. Finally, we also provide an analysis of our multiple instance learning algorithm in a simple setting and show that the proposed algorithm converges to the global optima at a linear rate, one of the first such result in this domain. The code for EMI-RNN is available at: https://github.com/Microsoft/EdgeML/tree/master/tf/examples/EMI-RNN
Tasks Multiple Instance Learning, Time Series, Time Series Classification
Published 2018-12-01
URL http://papers.nips.cc/paper/8292-multiple-instance-learning-for-efficient-sequential-data-classification-on-resource-constrained-devices
PDF http://papers.nips.cc/paper/8292-multiple-instance-learning-for-efficient-sequential-data-classification-on-resource-constrained-devices.pdf
PWC https://paperswithcode.com/paper/multiple-instance-learning-for-efficient
Repo https://github.com/Microsoft/EdgeML
Framework tf

Out-of-Distribution Detection Using an Ensemble of Self Supervised Leave-out Classifiers

Title Out-of-Distribution Detection Using an Ensemble of Self Supervised Leave-out Classifiers
Authors Apoorv Vyas, Nataraj Jammalamadaka, Xia Zhu, Dipankar Das, Bharat Kaul, Theodore L. Willke
Abstract As deep learning methods form a critical part in commercially important applications such as autonomous driving and medical diagnostics, it is important to reliably detect out-of-distribution (OOD) inputs while employing these algorithms. In this work, we propose an OOD detection algorithm which comprises of an ensemble of classifiers. We train each classifier in a self-supervised manner by leaving out a random subset of training data as OOD data and the rest as in-distribution (ID) data. We propose a novel margin-based loss over the softmax output which seeks to maintain at least a margin m between the average entropy of the OOD and in-distribution samples. In conjunction with the standard cross-entropy loss, we minimize the novel loss to train an ensemble of classifiers. We also propose a novel method to combine the outputs of the ensemble of classifiers to obtain OOD detection score and class prediction. Overall, our method convincingly outperforms Hendrycks et al. [7] and the current state-of-the-art ODIN [13] on several OOD detection benchmarks.
Tasks Autonomous Driving, Out-of-Distribution Detection
Published 2018-09-01
URL http://openaccess.thecvf.com/content_ECCV_2018/html/Apoorv_Vyas_Out-of-Distribution_Detection_Using_ECCV_2018_paper.html
PDF http://openaccess.thecvf.com/content_ECCV_2018/papers/Apoorv_Vyas_Out-of-Distribution_Detection_Using_ECCV_2018_paper.pdf
PWC https://paperswithcode.com/paper/out-of-distribution-detection-using-an
Repo https://github.com/YU1ut/Ensemble-of-Leave-out-Classifiers
Framework pytorch
comments powered by Disqus