October 16, 2019

3004 words 15 mins read

Paper Group NAWR 3

Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning. Killing Four Birds with Two Stones: Multi-Task Learning for Non-Literal Language Detection. Variational Approach for Capsule Video Frame Interpolation. Semantic Structure-based Unsupervised Deep Hashing. Baseline: A Library for Rapid Modeling, Experim …

Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning


Title	Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning
Authors	Piyush Sharma, Nan Ding, Sebastian Goodman, Radu Soricut
Abstract	We present a new dataset of image caption annotations, Conceptual Captions, which contains an order of magnitude more images than the MS-COCO dataset (Lin et al., 2014) and represents a wider variety of both images and image caption styles. We achieve this by extracting and filtering image caption annotations from billions of webpages. We also present quantitative evaluations of a number of image captioning models and show that a model architecture based on Inception-ResNetv2 (Szegedy et al., 2016) for image-feature extraction and Transformer (Vaswani et al., 2017) for sequence modeling achieves the best performance when trained on the Conceptual Captions dataset.
Tasks	Image Captioning
Published	2018-07-01
URL	https://www.aclweb.org/anthology/P18-1238/
PDF	https://www.aclweb.org/anthology/P18-1238
PWC	https://paperswithcode.com/paper/conceptual-captions-a-cleaned-hypernymed
Repo	https://github.com/google-research-datasets/conceptual-captions
Framework	none

Killing Four Birds with Two Stones: Multi-Task Learning for Non-Literal Language Detection


Title	Killing Four Birds with Two Stones: Multi-Task Learning for Non-Literal Language Detection
Authors	Erik-L{^a}n Do Dinh, Steffen Eger, Iryna Gurevych
Abstract	Non-literal language phenomena such as idioms or metaphors are commonly studied in isolation from each other in NLP. However, often similar definitions and features are being used for different phenomena, challenging the distinction. Instead, we propose to view the detection problem as a generalized non-literal language classification problem. In this paper we investigate multi-task learning for related non-literal language phenomena. We show that in contrast to simply joining the data of multiple tasks, multi-task learning consistently improves upon four metaphor and idiom detection tasks in two languages, English and German. Comparing two state-of-the-art multi-task learning architectures, we also investigate when soft parameter sharing and learned information flow can be beneficial for our related tasks. We make our adapted code publicly available.
Tasks	Multi-Task Learning
Published	2018-08-01
URL	https://www.aclweb.org/anthology/C18-1132/
PDF	https://www.aclweb.org/anthology/C18-1132
PWC	https://paperswithcode.com/paper/killing-four-birds-with-two-stones-multi-task
Repo	https://github.com/UKPLab/coling2018-nonliteral-mtl
Framework	none

Variational Approach for Capsule Video Frame Interpolation


Title	Variational Approach for Capsule Video Frame Interpolation
Authors	Ahmed Mohammed, Ivar Farup, Sule Yildirim, Marius Pedersen, Øistein Hovde
Abstract	Capsule video endoscopy, which uses a wireless camera to visualize the digestive tract, is emerging as an alternative to traditional colonoscopy. Colonoscopy is considered as the gold standard for visualizing the colon and takes 30 frames per second. Capsule images, on the other hand, are taken with low frame rate (average five frames per second), which makes it difficult to find pathology and results in eye fatigue for viewing. In this paper, we propose a variational algorithm to smooth the video temporally and create a visually pleasant video. The main objective of the paper is to increase the frame rate to be closer to that of the colonoscopy. We propose variational energy that takes into consideration both motion estimation and intermediate frame intensity interpolation using the surrounding frames. The proposed formulation incorporates both pixel intensity and texture feature in the optical flow objective function such that the interpolation at the intermediate frame is directly modeled. The main feature of this formulation is that error in motion estimation is incorporated in our model, so that only robust motion estimation are used in estimating the intensity of the intermediate frame. We derived Euler-Lagrange equations and showed an efficient numerical scheme that can be implemented on graphics hardware. Finally, a motion compensated frame rate doubling version of our method is implemented. We evaluate the quality of both 90 and 100% of the frames for medical diagnosis domain through objective image quality metrics. Our method improves state-of-the-art result for 90% frames while performing equivalent for the remaining cases with other existing methods. In the last section, we show application of frame interpolation to informative frame segment visualization and to reduce the power consumption.
Tasks	Medical Diagnosis, Motion Estimation, Optical Flow Estimation, Video Frame Interpolation
Published	2018-11-06
URL	https://jivp-eurasipjournals.springeropen.com/articles/10.1186/s13640-018-0267-9
PDF	https://jivp-eurasipjournals.springeropen.com/articles/10.1186/s13640-018-0267-9
PWC	https://paperswithcode.com/paper/variational-approach-for-capsule-video-frame
Repo	https://github.com/ahme0307/TSR
Framework	none

Semantic Structure-based Unsupervised Deep Hashing


Title	Semantic Structure-based Unsupervised Deep Hashing
Authors	Erkun Yang, Cheng Deng, Tongliang Liu, Wei Liu, Dacheng Tao
Abstract	Hashing is becoming increasingly popular for approximate nearest neighbor searching in massive databases due to its storage and search efficiency. Recent supervised hashing methods, which usually construct semantic similarity matrices to guide hash code learning using label information, have shown promising results. However, it is relatively difficult to capture and utilize the semantic relationships between points in unsupervised settings. To address this problem, we propose a novel unsupervised deep framework called Semantic Structure-based unsupervised Deep Hashing (SSDH). We first empirically study the deep feature statistics, and find that the distribution of the cosine distance for point pairs can be estimated by two half Gaussian distributions. Based on this observation, we construct the semantic structure by considering points with distances obviously smaller than the others as semantically similar and points with distances obviously larger than the others as semantically dissimilar. We then design a deep architecture and a pair-wise loss function to preserve this semantic structure in Hamming space. Extensive experiments show that SSDH significantly outperforms current state-of-the-art methods.
Tasks	Semantic Similarity, Semantic Textual Similarity
Published	2018-05-01
URL	https://www.researchgate.net/publication/326206331_Semantic_Structure-based_Unsupervised_Deep_Hashing
PDF	https://www.ijcai.org/proceedings/2018/0148.pdf
PWC	https://paperswithcode.com/paper/semantic-structure-based-unsupervised-deep
Repo	https://github.com/yangerkun/IJCAI2018_SSDH
Framework	tf

Baseline: A Library for Rapid Modeling, Experimentation and Development of Deep Learning Algorithms targeting NLP


Title	Baseline: A Library for Rapid Modeling, Experimentation and Development of Deep Learning Algorithms targeting NLP
Authors	Daniel Pressel, Sagnik Ray Choudhury, Brian Lester, Yanjie Zhao, Matt Barta
Abstract	We introduce Baseline: a library for reproducible deep learning research and fast model development for NLP. The library provides easily extensible abstractions and implementations for data loading, model development, training and export of deep learning architectures. It also provides implementations for simple, high-performance, deep learning models for various NLP tasks, against which newly developed models can be compared. Deep learning experiments are hard to reproduce, Baseline provides functionalities to track them. The goal is to allow a researcher to focus on model development, delegating the repetitive tasks to the library.
Tasks	Language Modelling, Machine Translation, Named Entity Recognition, Part-Of-Speech Tagging, Slot Filling
Published	2018-07-01
URL	https://www.aclweb.org/anthology/W18-2506/
PDF	https://www.aclweb.org/anthology/W18-2506
PWC	https://paperswithcode.com/paper/baseline-a-library-for-rapid-modeling
Repo	https://github.com/dpressel/baseline
Framework	tf

A Deeply-initialized Coarse-to-fine Ensemble of Regression Trees for Face Alignment


Title	A Deeply-initialized Coarse-to-fine Ensemble of Regression Trees for Face Alignment
Authors	Roberto Valle, Jose M. Buenaposada, Antonio Valdes, Luis Baumela
Abstract	In this paper we present DCFE, a real-time facial landmark regression method based on a coarse-to-fine Ensemble of Regression Trees (ERT). We use a simple Convolutional Neural Network (CNN) to generate probability maps of landmarks location. These are further refined with the ERT regressor, which is initialized by fitting a 3D face model to the landmark maps. The coarse-to-fine structure of the ERT lets us address the combinatorial explosion of parts deformation. With the 3D model we also tackle other key problems such as robust regressor initialization, self occlusions, and simultaneous frontal and profile face analysis. In the experiments DCFE achieves the best reported result in AFLW, COFW, and 300W private and common public data sets.
Tasks	Face Alignment, Facial Landmark Detection
Published	2018-09-01
URL	http://openaccess.thecvf.com/content_ECCV_2018/html/Roberto_Valle_A_Deeply-initialized_Coarse-to-fine_ECCV_2018_paper.html
PDF	http://openaccess.thecvf.com/content_ECCV_2018/papers/Roberto_Valle_A_Deeply-initialized_Coarse-to-fine_ECCV_2018_paper.pdf
PWC	https://paperswithcode.com/paper/a-deeply-initialized-coarse-to-fine-ensemble
Repo	https://github.com/bobetocalo/bobetocalo_eccv18
Framework	none

Rumor Detection on Twitter with Tree-structured Recursive Neural Networks


Title	Rumor Detection on Twitter with Tree-structured Recursive Neural Networks
Authors	Jing Ma, Wei Gao, Kam-Fai Wong
Abstract	Automatic rumor detection is technically very challenging. In this work, we try to learn discriminative features from tweets content by following their non-sequential propagation structure and generate more powerful representations for identifying different type of rumors. We propose two recursive neural models based on a bottom-up and a top-down tree-structured neural networks for rumor representation learning and classification, which naturally conform to the propagation layout of tweets. Results on two public Twitter datasets demonstrate that our recursive neural models 1) achieve much better performance than state-of-the-art approaches; 2) demonstrate superior capacity on detecting rumors at very early stage.
Tasks	Feature Engineering, Representation Learning
Published	2018-07-01
URL	https://www.aclweb.org/anthology/P18-1184/
PDF	https://www.aclweb.org/anthology/P18-1184
PWC	https://paperswithcode.com/paper/rumor-detection-on-twitter-with-tree
Repo	https://github.com/majingCUHK/Rumor_RvNN
Framework	none

Representation Learning of Compositional Data


Title	Representation Learning of Compositional Data
Authors	Marta Avalos, Richard Nock, Cheng Soon Ong, Julien Rouar, Ke Sun
Abstract	We consider the problem of learning a low dimensional representation for compositional data. Compositional data consists of a collection of nonnegative data that sum to a constant value. Since the parts of the collection are statistically dependent, many standard tools cannot be directly applied. Instead, compositional data must be first transformed before analysis. Focusing on principal component analysis (PCA), we propose an approach that allows low dimensional representation learning directly from the original data. Our approach combines the benefits of the log-ratio transformation from compositional data analysis and exponential family PCA. A key tool in its derivation is a generalization of the scaled Bregman theorem, that relates the perspective transform of a Bregman divergence to the Bregman divergence of a perspective transform and a remainder conformal divergence. Our proposed approach includes a convenient surrogate (upper bound) loss of the exponential family PCA which has an easy to optimize form. We also derive the corresponding form for nonlinear autoencoders. Experiments on simulated data and microbiome data show the promise of our method.
Tasks	Representation Learning
Published	2018-12-01
URL	http://papers.nips.cc/paper/7902-representation-learning-of-compositional-data
PDF	http://papers.nips.cc/paper/7902-representation-learning-of-compositional-data.pdf
PWC	https://paperswithcode.com/paper/representation-learning-of-compositional-data
Repo	https://github.com/sistm/CoDa-PCA
Framework	none

Learning Target-Specific Representations of Financial News Documents For Cumulative Abnormal Return Prediction


Title	Learning Target-Specific Representations of Financial News Documents For Cumulative Abnormal Return Prediction
Authors	Junwen Duan, Yue Zhang, Xiao Ding, Ching-Yun Chang, Ting Liu
Abstract	Texts from the Internet serve as important data sources for financial market modeling. Early statistical approaches rely on manually defined features to capture lexical, sentiment and event information, which suffers from feature sparsity. Recent work has considered learning dense representations for news titles and abstracts. Compared to news titles, full documents can contain more potentially helpful information, but also noise compared to events and sentences, which has been less investigated in previous work. To fill this gap, we propose a novel target-specific abstract-guided news document representation model. The model uses a target-sensitive representation of the news abstract to weigh sentences in the news content, so as to select and combine the most informative sentences for market modeling. Results show that document representations can give better performance for estimating cumulative abnormal returns of companies when compared to titles and abstracts. Our model is especially effective when it used to combine information from multiple document sources compared to the sentence-level baselines.
Tasks	Information Retrieval, Stock Market Prediction
Published	2018-08-01
URL	https://www.aclweb.org/anthology/C18-1239/
PDF	https://www.aclweb.org/anthology/C18-1239
PWC	https://paperswithcode.com/paper/learning-target-specific-representations-of
Repo	https://github.com/sudy/coling2018
Framework	pytorch

Enriching the WebNLG corpus


Title	Enriching the WebNLG corpus
Authors	Thiago Castro Ferreira, Diego Moussallem, Emiel Krahmer, S Wubben, er
Abstract	This paper describes the enrichment of WebNLG corpus (Gardent et al., 2017a,b), with the aim to further extend its usefulness as a resource for evaluating common NLG tasks, including Discourse Ordering, Lexicalization and Referring Expression Generation. We also produce a silver-standard German translation of the corpus to enable the exploitation of NLG approaches to other languages than English. The enriched corpus is publicly available.
Tasks	Machine Translation, Text Generation
Published	2018-11-01
URL	https://www.aclweb.org/anthology/W18-6521/
PDF	https://www.aclweb.org/anthology/W18-6521
PWC	https://paperswithcode.com/paper/enriching-the-webnlg-corpus
Repo	https://github.com/ThiagoCF05/webnlg
Framework	none

Syntactic Manipulation for Generating more Diverse and Interesting Texts


Title	Syntactic Manipulation for Generating more Diverse and Interesting Texts
Authors	Jan Milan Deriu, Mark Cieliebak
Abstract	Natural Language Generation plays an important role in the domain of dialogue systems as it determines how users perceive the system. Recently, deep-learning based systems have been proposed to tackle this task, as they generalize better and require less amounts of manual effort to implement them for new domains. However, deep learning systems usually adapt a very homogeneous sounding writing style which expresses little variation. In this work, we present our system for Natural Language Generation where we control various aspects of the surface realization in order to increase the lexical variability of the utterances, such that they sound more diverse and interesting. For this, we use a Semantically Controlled Long Short-term Memory Network (SC-LSTM), and apply its specialized cell to control various syntactic features of the generated texts. We present an in-depth human evaluation where we show the effects of these surface manipulation on the perception of potential users.
Tasks	Text Generation
Published	2018-11-01
URL	https://www.aclweb.org/anthology/W18-6503/
PDF	https://www.aclweb.org/anthology/W18-6503
PWC	https://paperswithcode.com/paper/syntactic-manipulation-for-generating-more
Repo	https://github.com/jderiu/e2e_nlg
Framework	tf

A PID Controller Approach for Stochastic Optimization of Deep Networks


Title	A PID Controller Approach for Stochastic Optimization of Deep Networks
Authors	Wangpeng An, Haoqian Wang, Qingyun Sun, Jun Xu, Qionghai Dai, Lei Zhang
Abstract	Deep neural networks have demonstrated their power in many computer vision applications. State-of-the-art deep architectures such as VGG, ResNet, and DenseNet are mostly optimized by the SGD-Momentum algorithm, which updates the weights by considering their past and current gradients. Nonetheless, SGD-Momentum suffers from the overshoot problem, which hinders the convergence of network training. Inspired by the prominent success of proportional-integral-derivative (PID) controller in automatic control, we propose a PID approach for accelerating deep network optimization. We first reveal the intrinsic connections between SGD-Momentum and PID based controller, then present the optimization algorithm which exploits the past, current, and change of gradients to update the network parameters. The proposed PID method reduces much the overshoot phenomena of SGD-Momentum, and it achieves up to 50% acceleration on popular deep network architectures with competitive accuracy, as verified by our experiments on the benchmark datasets including CIFAR10, CIFAR100, and Tiny-ImageNet.
Tasks	Stochastic Optimization
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/An_A_PID_Controller_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/An_A_PID_Controller_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/a-pid-controller-approach-for-stochastic
Repo	https://github.com/jettify/pytorch-optimizer
Framework	pytorch

Unsupervised Morphology Learning with Statistical Paradigms


Title	Unsupervised Morphology Learning with Statistical Paradigms
Authors	Hongzhi Xu, Mitchell Marcus, Charles Yang, Lyle Ungar
Abstract	This paper describes an unsupervised model for morphological segmentation that exploits the notion of paradigms, which are sets of morphological categories (e.g., suffixes) that can be applied to a homogeneous set of words (e.g., nouns or verbs). Our algorithm identifies statistically reliable paradigms from the morphological segmentation result of a probabilistic model, and chooses reliable suffixes from them. The new suffixes can be fed back iteratively to improve the accuracy of the probabilistic model. Finally, the unreliable paradigms are subjected to pruning to eliminate unreliable morphological relations between words. The paradigm-based algorithm significantly improves segmentation accuracy. Our method achieves start-of-the-art results on experiments using the Morpho-Challenge data, including English, Turkish, and Finnish.
Tasks	Information Retrieval, Text Generation
Published	2018-08-01
URL	https://www.aclweb.org/anthology/C18-1005/
PDF	https://www.aclweb.org/anthology/C18-1005
PWC	https://paperswithcode.com/paper/unsupervised-morphology-learning-with
Repo	https://github.com/xuhongzhi/ParaMA
Framework	none

Multiple Instance Learning for Efficient Sequential Data Classification on Resource-constrained Devices


Title	Multiple Instance Learning for Efficient Sequential Data Classification on Resource-constrained Devices
Authors	Don Dennis, Chirag Pabbaraju, Harsha Vardhan Simhadri, Prateek Jain
Abstract	We study the problem of fast and efficient classification of sequential data (such as time-series) on tiny devices, which is critical for various IoT related applications like audio keyword detection or gesture detection. Such tasks are cast as a standard classification task by sliding windows over the data stream to construct data points. Deploying such classification modules on tiny devices is challenging as predictions over sliding windows of data need to be invoked continuously at a high frequency. Each such predictor instance in itself is expensive as it evaluates large models over long windows of data. In this paper, we address this challenge by exploiting the following two observations about classification tasks arising in typical IoT related applications: (a) the “signature” of a particular class (e.g. an audio keyword) typically occupies a small fraction of the overall data, and (b) class signatures tend to be discernible early on in the data. We propose a method, EMI-RNN, that exploits these observations by using a multiple instance learning formulation along with an early prediction technique to learn a model that achieves better accuracy compared to baseline models, while simultaneously reducing computation by a large fraction. For instance, on a gesture detection benchmark [ 25 ], EMI-RNN improves standard LSTM model’s accuracy by up to 1% while requiring 72x less computation. This enables us to deploy such models for continuous real-time prediction on a small device such as Raspberry Pi0 and Arduino variants, a task that the baseline LSTM could not achieve. Finally, we also provide an analysis of our multiple instance learning algorithm in a simple setting and show that the proposed algorithm converges to the global optima at a linear rate, one of the first such result in this domain. The code for EMI-RNN is available at: https://github.com/Microsoft/EdgeML/tree/master/tf/examples/EMI-RNN
Tasks	Multiple Instance Learning, Time Series, Time Series Classification
Published	2018-12-01
URL	http://papers.nips.cc/paper/8292-multiple-instance-learning-for-efficient-sequential-data-classification-on-resource-constrained-devices
PDF	http://papers.nips.cc/paper/8292-multiple-instance-learning-for-efficient-sequential-data-classification-on-resource-constrained-devices.pdf
PWC	https://paperswithcode.com/paper/multiple-instance-learning-for-efficient
Repo	https://github.com/Microsoft/EdgeML
Framework	tf

Out-of-Distribution Detection Using an Ensemble of Self Supervised Leave-out Classifiers


Title	Out-of-Distribution Detection Using an Ensemble of Self Supervised Leave-out Classifiers
Authors	Apoorv Vyas, Nataraj Jammalamadaka, Xia Zhu, Dipankar Das, Bharat Kaul, Theodore L. Willke
Abstract	As deep learning methods form a critical part in commercially important applications such as autonomous driving and medical diagnostics, it is important to reliably detect out-of-distribution (OOD) inputs while employing these algorithms. In this work, we propose an OOD detection algorithm which comprises of an ensemble of classifiers. We train each classifier in a self-supervised manner by leaving out a random subset of training data as OOD data and the rest as in-distribution (ID) data. We propose a novel margin-based loss over the softmax output which seeks to maintain at least a margin m between the average entropy of the OOD and in-distribution samples. In conjunction with the standard cross-entropy loss, we minimize the novel loss to train an ensemble of classifiers. We also propose a novel method to combine the outputs of the ensemble of classifiers to obtain OOD detection score and class prediction. Overall, our method convincingly outperforms Hendrycks et al. [7] and the current state-of-the-art ODIN [13] on several OOD detection benchmarks.
Tasks	Autonomous Driving, Out-of-Distribution Detection
Published	2018-09-01
URL	http://openaccess.thecvf.com/content_ECCV_2018/html/Apoorv_Vyas_Out-of-Distribution_Detection_Using_ECCV_2018_paper.html
PDF	http://openaccess.thecvf.com/content_ECCV_2018/papers/Apoorv_Vyas_Out-of-Distribution_Detection_Using_ECCV_2018_paper.pdf
PWC	https://paperswithcode.com/paper/out-of-distribution-detection-using-an
Repo	https://github.com/YU1ut/Ensemble-of-Leave-out-Classifiers
Framework	pytorch