October 15, 2019

2886 words 14 mins read

Paper Group NANR 85

Paper Group NANR 85

Discovering Canonical Indian English Accents: A Crowdsourcing-based Approach. Enhancing the Spatial Resolution of Stereo Images Using a Parallax Prior. Thread Popularity Prediction and Tracking with a Permutation-invariant Model. Feature Super-Resolution: Make Machine See More Clearly. Learning Covariate-Specific Embeddings with Tensor Decompositio …

Discovering Canonical Indian English Accents: A Crowdsourcing-based Approach

Title Discovering Canonical Indian English Accents: A Crowdsourcing-based Approach
Authors Sunayana Sitaram, Varun Manjunath, Varun Bharadwaj, Monojit Choudhury, Kalika Bali, Michael Tjalve
Abstract
Tasks Speech Recognition
Published 2018-05-01
URL https://www.aclweb.org/anthology/L18-1455/
PDF https://www.aclweb.org/anthology/L18-1455
PWC https://paperswithcode.com/paper/discovering-canonical-indian-english-accents
Repo
Framework

Enhancing the Spatial Resolution of Stereo Images Using a Parallax Prior

Title Enhancing the Spatial Resolution of Stereo Images Using a Parallax Prior
Authors Daniel S. Jeon, Seung-Hwan Baek, Inchang Choi, Min H. Kim
Abstract We present a novel method that can enhance the spatial resolution of stereo images using a parallax prior. While traditional stereo imaging has focused on estimating depth from stereo images, our method utilizes stereo images to enhance spatial resolution instead of estimating disparity. The critical challenge for enhancing spatial resolution from stereo images: how to register corresponding pixels with subpixel accuracy. Since disparity in traditional stereo imaging is calculated per pixel, it is directly inappropriate for enhancing spatial resolution. We, therefore, learn a parallax prior from stereo image datasets by jointly training two-stage networks. The first network learns how to enhance the spatial resolution of stereo images in luminance, and the second network learns how to reconstruct a high-resolution color image from high-resolution luminance and chrominance of the input image. Our two-stage joint network enhances the spatial resolution of stereo images significantly more than single-image super-resolution methods. The proposed method is directly applicable to any stereo depth imaging methods, enabling us to enhance the spatial resolution of stereo images.
Tasks Image Super-Resolution, Super-Resolution
Published 2018-06-01
URL http://openaccess.thecvf.com/content_cvpr_2018/html/Jeon_Enhancing_the_Spatial_CVPR_2018_paper.html
PDF http://openaccess.thecvf.com/content_cvpr_2018/papers/Jeon_Enhancing_the_Spatial_CVPR_2018_paper.pdf
PWC https://paperswithcode.com/paper/enhancing-the-spatial-resolution-of-stereo
Repo
Framework

Thread Popularity Prediction and Tracking with a Permutation-invariant Model

Title Thread Popularity Prediction and Tracking with a Permutation-invariant Model
Authors Hou Pong Chan, Irwin King
Abstract The task of thread popularity prediction and tracking aims to recommend a few popular comments to subscribed users when a batch of new comments arrive in a discussion thread. This task has been formulated as a reinforcement learning problem, in which the reward of the agent is the sum of positive responses received by the recommended comments. In this work, we propose a novel approach to tackle this problem. First, we propose a deep neural network architecture to model the expected cumulative reward (Q-value) of a recommendation (action). Unlike the state-of-the-art approach, which treats an action as a sequence, our model uses an attention mechanism to integrate information from a set of comments. Thus, the prediction of Q-value is invariant to the permutation of the comments, which leads to a more consistent agent behavior. Second, we employ a greedy procedure to approximate the action that maximizes the predicted Q-value from a combinatorial action space. Different from the state-of-the-art approach, this procedure does not require an additional pre-trained model to generate candidate actions. Experiments on five real-world datasets show that our approach outperforms the state-of-the-art.
Tasks
Published 2018-10-01
URL https://www.aclweb.org/anthology/D18-1376/
PDF https://www.aclweb.org/anthology/D18-1376
PWC https://paperswithcode.com/paper/thread-popularity-prediction-and-tracking
Repo
Framework

Feature Super-Resolution: Make Machine See More Clearly

Title Feature Super-Resolution: Make Machine See More Clearly
Authors Weimin Tan, Bo Yan, Bahetiyaer Bare
Abstract Identifying small size images or small objects is a notoriously challenging problem, as discriminative representations are difficult to learn from the limited information contained in them with poor-quality appearance and unclear object structure. Existing research works usually increase the resolution of low-resolution image in the pixel space in order to provide better visual quality for human viewing. However, the improved performance of such methods is usually limited or even trivial in the case of very small image size (we will show it in this paper explicitly). In this paper, different from image super-resolution (ISR), we propose a novel super-resolution technique called feature super-resolution (FSR), which aims at enhancing the discriminatory power of small size image in order to provide high recognition precision for machine. To achieve this goal, we propose a new Feature Super-Resolution Generative Adversarial Network (FSR-GAN) model that transforms the raw poor features of small size images to highly discriminative ones by performing super-resolution in the feature space. Our FSR-GAN consists of two subnetworks: a feature generator network G and a feature discriminator network D. By training the G and the D networks in an alternative manner, we encourage the G network to discover the latent distribution correlations between small size and large size images and then use G to improve the representations of small images. Extensive experiment results on Oxford5K, Paris, Holidays, and Flick100k datasets demonstrate that the proposed FSR approach can effectively enhance the discriminatory ability of features. Even when the resolution of query images is reduced greatly, e.g., 1/64 original size, the query feature enhanced by our FSR approach achieves surprisingly high retrieval performance at different image resolutions and increases the retrieval precision by 25% compared to the raw query feature.
Tasks Image Super-Resolution, Super-Resolution
Published 2018-06-01
URL http://openaccess.thecvf.com/content_cvpr_2018/html/Tan_Feature_Super-Resolution_Make_CVPR_2018_paper.html
PDF http://openaccess.thecvf.com/content_cvpr_2018/papers/Tan_Feature_Super-Resolution_Make_CVPR_2018_paper.pdf
PWC https://paperswithcode.com/paper/feature-super-resolution-make-machine-see
Repo
Framework

Learning Covariate-Specific Embeddings with Tensor Decompositions

Title Learning Covariate-Specific Embeddings with Tensor Decompositions
Authors Kevin Tian, Teng Zhang, James Zou
Abstract Word embedding is a useful approach to capture co-occurrence structures in a large corpus of text. In addition to the text data itself, we often have additional covariates associated with individual documents in the corpus—e.g. the demographic of the author, time and venue of publication, etc.—and we would like the embedding to naturally capture the information of the covariates. In this paper, we propose a new tensor decomposition model for word embeddings with covariates. Our model jointly learns a \emph{base} embedding for all the words as well as a weighted diagonal transformation to model how each covariate modifies the base embedding. To obtain the specific embedding for a particular author or venue, for example, we can then simply multiply the base embedding by the transformation matrix associated with that time or venue. The main advantages of our approach is data efficiency and interpretability of the covariate transformation matrix. Our experiments demonstrate that our joint model learns substantially better embeddings conditioned on each covariate compared to the standard approach of learning a separate embedding for each covariate using only the relevant subset of data. Furthermore, our model encourages the embeddings to be ``topic-aligned’’ in the sense that the dimensions have specific independent meanings. This allows our covariate-specific embeddings to be compared by topic, enabling downstream differential analysis. We empirically evaluate the benefits of our algorithm on several datasets, and demonstrate how it can be used to address many natural questions about the effects of covariates. |
Tasks Word Embeddings
Published 2018-01-01
URL https://openreview.net/forum?id=B1suU-bAW
PDF https://openreview.net/pdf?id=B1suU-bAW
PWC https://paperswithcode.com/paper/learning-covariate-specific-embeddings-with
Repo
Framework

Fast and Accurate Inference with Adaptive Ensemble Prediction for Deep Networks

Title Fast and Accurate Inference with Adaptive Ensemble Prediction for Deep Networks
Authors Hiroshi Inoue
Abstract Ensembling multiple predictions is a widely-used technique to improve the accuracy of various machine learning tasks. In image classification tasks, for example, averaging the predictions for multiple patches extracted from the input image significantly improves accuracy. Using multiple networks trained independently to make predictions improves accuracy further. One obvious drawback of the ensembling technique is its higher execution cost during inference.% If we average 100 local predictions, the execution cost will be 100 times as high as the cost without the ensemble. This higher cost limits the real-world use of ensembling. In this paper, we first describe our insights on relationship between the probability of the prediction and the effect of ensembling with current deep neural networks; ensembling does not help mispredictions for inputs predicted with a high probability, i.e. the output from the softmax. This finding motivates us to develop a new technique called adaptive ensemble prediction, which achieves the benefits of ensembling with much smaller additional execution costs. Hence, we calculate the confidence level of the prediction for each input from the probabilities of the local predictions during the ensembling computation. If the prediction for an input reaches a high enough probability on the basis of the confidence level, we stop ensembling for this input to avoid wasting computation power. We evaluated the adaptive ensembling by using various datasets and showed that it reduces the computation cost significantly while achieving similar accuracy to the naive ensembling. We also showed that our statistically rigorous confidence-level-based termination condition reduces the burden of the task-dependent parameter tuning compared to the naive termination based on the pre-defined threshold in addition to yielding a better accuracy with the same cost.
Tasks Image Classification
Published 2018-01-01
URL https://openreview.net/forum?id=SkBcLugC-
PDF https://openreview.net/pdf?id=SkBcLugC-
PWC https://paperswithcode.com/paper/fast-and-accurate-inference-with-adaptive-1
Repo
Framework

Appearance-Based Gaze Estimation via Evaluation-Guided Asymmetric Regression

Title Appearance-Based Gaze Estimation via Evaluation-Guided Asymmetric Regression
Authors Yihua Cheng, Feng Lu, Xucong Zhang
Abstract Eye gaze estimation has been increasingly demanded by recent intelligent systems to accomplish a range of interaction-related tasks, by using simple eye images as input. However, learning the highly complex regression between eye images and gaze directions is nontrivial, and thus the problem is yet to be solved efficiently. In this paper, we propose the Asymmetric Regression-Evaluation Network (ARE-Net), and try to improve the gaze estimation performance to its full extent. At the core of our method is the notion of ``two eye asymmetry’’ observed during gaze estimation for the left and right eyes. Inspired by this, we design the multi-stream ARE-Net; one asymmetric regression network (AR-Net) predicts 3D gaze directions for both eyes with a novel asymmetric strategy, and the evaluation network (E-Net) adaptively adjusts the strategy by evaluating the two eyes in terms of their performance during optimization. By training the whole network, our method achieves promising results and surpasses the state-of-the-art methods on multiple public datasets. |
Tasks Gaze Estimation
Published 2018-09-01
URL http://openaccess.thecvf.com/content_ECCV_2018/html/Yihua_Cheng_Appearance-Based_Gaze_Estimation_ECCV_2018_paper.html
PDF http://openaccess.thecvf.com/content_ECCV_2018/papers/Yihua_Cheng_Appearance-Based_Gaze_Estimation_ECCV_2018_paper.pdf
PWC https://paperswithcode.com/paper/appearance-based-gaze-estimation-via
Repo
Framework

Analyzing Correlated Evolution of Multiple Features Using Latent Representations

Title Analyzing Correlated Evolution of Multiple Features Using Latent Representations
Authors Yugo Murawaki
Abstract Statistical phylogenetic models have allowed the quantitative analysis of the evolution of a single categorical feature and a pair of binary features, but correlated evolution involving multiple discrete features is yet to be explored. Here we propose latent representation-based analysis in which (1) a sequence of discrete surface features is projected to a sequence of independent binary variables and (2) phylogenetic inference is performed on the latent space. In the experiments, we analyze the features of linguistic typology, with a special focus on the order of subject, object and verb. Our analysis suggests that languages sharing the same word order are not necessarily a coherent group but exhibit varying degrees of diachronic stability depending on other features.
Tasks
Published 2018-10-01
URL https://www.aclweb.org/anthology/D18-1468/
PDF https://www.aclweb.org/anthology/D18-1468
PWC https://paperswithcode.com/paper/analyzing-correlated-evolution-of-multiple
Repo
Framework

Can You Spot the Semantic Predicate in this Video?

Title Can You Spot the Semantic Predicate in this Video?
Authors Christopher Reale, Claire Bonial, Heesung Kwon, Clare Voss
Abstract We propose a method to improve human activity recognition in video by leveraging semantic information about the target activities from an expert-defined linguistic resource, VerbNet. Our hypothesis is that activities that share similar event semantics, as defined by the semantic predicates of VerbNet, will be more likely to share some visual components. We use a deep convolutional neural network approach as a baseline and incorporate linguistic information from VerbNet through multi-task learning. We present results of experiments showing the added information has negligible impact on recognition performance. We discuss how this may be because the lexical semantic information defined by VerbNet is generally not visually salient given the video processing approach used here, and how we may handle this in future approaches.
Tasks Activity Recognition, Human Activity Recognition, Multi-Task Learning
Published 2018-08-01
URL https://www.aclweb.org/anthology/W18-4307/
PDF https://www.aclweb.org/anthology/W18-4307
PWC https://paperswithcode.com/paper/can-you-spot-the-semantic-predicate-in-this
Repo
Framework

Museum Exhibit Identification Challenge for the Supervised Domain Adaptation and Beyond

Title Museum Exhibit Identification Challenge for the Supervised Domain Adaptation and Beyond
Authors Piotr Koniusz, Yusuf Tas, Hongguang Zhang, Mehrtash Harandi, Fatih Porikli, Rui Zhang
Abstract We study an open problem of artwork identification and propose a new dataset dubbed Open Museum Identification Challenge (Open MIC). It contains photos of exhibits captured in 10 distinct exhibition spaces of several museums which showcase paintings, timepieces, sculptures, glassware, relics, science exhibits, natural history pieces, ceramics, pottery, tools and indigenous crafts. The goal of Open MIC is to stimulate research in domain adaptation, egocentric recognition and few-shot learning by providing a testbed complementary to the famous Office dataset which reaches ~90% accuracy. To form our dataset, we captured a number of images per art piece with a mobile phone and wearable cameras to form the source and target data splits, respectively. To achieve robust baselines, we build on a recent approach that aligns per-class scatter matrices of the source and target CNN streams. Moreover, we exploit the positive definite nature of such representations by using end-to-end Bregman divergences and the Riemannian metric. We present baselines such as training/evaluation per exhibition and training/evaluation on the combined set covering 866 exhibit identities. As each exhibition poses distinct challenges e.g., quality of lighting, motion blur, occlusions, clutter, viewpoint and scale variations, rotations, glares, transparency, non-planarity, clipping, we break down results w.r.t. these factors.
Tasks Domain Adaptation, Few-Shot Learning
Published 2018-09-01
URL http://openaccess.thecvf.com/content_ECCV_2018/html/Piotr_Koniusz_Museum_Exhibit_Identification_ECCV_2018_paper.html
PDF http://openaccess.thecvf.com/content_ECCV_2018/papers/Piotr_Koniusz_Museum_Exhibit_Identification_ECCV_2018_paper.pdf
PWC https://paperswithcode.com/paper/museum-exhibit-identification-challenge-for-1
Repo
Framework

Diversity-Promoting GAN: A Cross-Entropy Based Generative Adversarial Network for Diversified Text Generation

Title Diversity-Promoting GAN: A Cross-Entropy Based Generative Adversarial Network for Diversified Text Generation
Authors Jingjing Xu, Xuancheng Ren, Junyang Lin, Xu Sun
Abstract Existing text generation methods tend to produce repeated and {''}boring{''} expressions. To tackle this problem, we propose a new text generation model, called Diversity-Promoting Generative Adversarial Network (DP-GAN). The proposed model assigns low reward for repeatedly generated text and high reward for {''}novel{''} and fluent text, encouraging the generator to produce diverse and informative text. Moreover, we propose a novel language-model based discriminator, which can better distinguish novel text from repeated text without the saturation problem compared with existing classifier-based discriminators. The experimental results on review generation and dialogue generation tasks demonstrate that our model can generate substantially more diverse and informative text than existing baselines.
Tasks Dialogue Generation, Language Modelling, Machine Translation, Text Generation, Text Summarization
Published 2018-10-01
URL https://www.aclweb.org/anthology/D18-1428/
PDF https://www.aclweb.org/anthology/D18-1428
PWC https://paperswithcode.com/paper/diversity-promoting-gan-a-cross-entropy-based
Repo
Framework

Feudal Dialogue Management with Jointly Learned Feature Extractors

Title Feudal Dialogue Management with Jointly Learned Feature Extractors
Authors I{~n}igo Casanueva, Pawe{\l} Budzianowski, Stefan Ultes, Florian Kreyssig, Bo-Hsiang Tseng, Yen-chen Wu, Milica Ga{\v{s}}i{'c}
Abstract Reinforcement learning (RL) is a promising dialogue policy optimisation approach, but traditional RL algorithms fail to scale to large domains. Recently, Feudal Dialogue Management (FDM), has shown to increase the scalability to large domains by decomposing the dialogue management decision into two steps, making use of the domain ontology to abstract the dialogue state in each step. In order to abstract the state space, however, previous work on FDM relies on handcrafted feature functions. In this work, we show that these feature functions can be learned jointly with the policy model while obtaining similar performance, even outperforming the handcrafted features in several environments and domains.
Tasks Dialogue Management, Spoken Dialogue Systems
Published 2018-07-01
URL https://www.aclweb.org/anthology/W18-5038/
PDF https://www.aclweb.org/anthology/W18-5038
PWC https://paperswithcode.com/paper/feudal-dialogue-management-with-jointly
Repo
Framework

Know Who Your Friends Are: Understanding Social Connections from Unstructured Text

Title Know Who Your Friends Are: Understanding Social Connections from Unstructured Text
Authors L{'e}a Deleris, Francesca Bonin, Elizabeth Daly, St{'e}phane Deparis, Yufang Hou, Charles Jochim, Yassine Lassoued, Killian Levacher
Abstract Having an understanding of interpersonal relationships is helpful in many contexts. Our system seeks to assist humans with that task, using textual information (e.g., case notes, speech transcripts, posts, books) as input. Specifically, our system first extracts qualitative and quantitative information elements (which we call signals) about interactions among persons, aggregates those to provide a condensed view of relationships and then enables users to explore all facets of the resulting social (multi-)graph through a visual interface.
Tasks
Published 2018-06-01
URL https://www.aclweb.org/anthology/N18-5016/
PDF https://www.aclweb.org/anthology/N18-5016
PWC https://paperswithcode.com/paper/know-who-your-friends-are-understanding
Repo
Framework

Cross-linguistically Small World Networks are Ubiquitous in Child-directed Speech

Title Cross-linguistically Small World Networks are Ubiquitous in Child-directed Speech
Authors Steven Moran, Danica Pajovi{'c}, Sabine Stoll
Abstract
Tasks Language Acquisition
Published 2018-05-01
URL https://www.aclweb.org/anthology/L18-1646/
PDF https://www.aclweb.org/anthology/L18-1646
PWC https://paperswithcode.com/paper/cross-linguistically-small-world-networks-are
Repo
Framework

NT2Lex: A CEFR-Graded Lexical Resource for Dutch as a Foreign Language Linked to Open Dutch WordNet

Title NT2Lex: A CEFR-Graded Lexical Resource for Dutch as a Foreign Language Linked to Open Dutch WordNet
Authors Ana{"\i}s Tack, Thomas Fran{\c{c}}ois, Piet Desmet, C{'e}drick Fairon
Abstract In this paper, we introduce NT2Lex, a novel lexical resource for Dutch as a foreign language (NT2) which includes frequency distributions of 17,743 words and expressions attested in expert-written textbook texts and readers graded along the scale of the Common European Framework of Reference (CEFR). In essence, the lexicon informs us about what kind of vocabulary should be understood when reading Dutch as a non-native reader at a particular proficiency level. The main novelty of the resource with respect to the previously developed CEFR-graded lexicons concerns the introduction of corpus-based evidence for L2 word sense complexity through the linkage to Open Dutch WordNet (Postma et al., 2016). The resource thus contains, on top of the lemmatised and part-of-speech tagged lexical entries, a total of 11,999 unique word senses and 8,934 distinct synsets.
Tasks Complex Word Identification
Published 2018-06-01
URL https://www.aclweb.org/anthology/W18-0514/
PDF https://www.aclweb.org/anthology/W18-0514
PWC https://paperswithcode.com/paper/nt2lex-a-cefr-graded-lexical-resource-for
Repo
Framework
comments powered by Disqus