October 15, 2019

2401 words 12 mins read

Paper Group NANR 278

Near Optimal Frequent Directions for Sketching Dense and Sparse Matrices. Combining Abstractness and Language-specific Theoretical Indicators for Detecting Non-Literal Usage of Estonian Particle Verbs. Simulating ASR errors for training SLU systems. HandMap: Robust Hand Pose Estimation via Intermediate Dense Guidance Map Supervision. Measuring Inno …

Near Optimal Frequent Directions for Sketching Dense and Sparse Matrices


Title	Near Optimal Frequent Directions for Sketching Dense and Sparse Matrices
Authors	Zengfeng Huang
Abstract	Given a large matrix $A\in\real^{n\times d}$, we consider the problem of computing a sketch matrix $B\in\real^{\ell\times d}$ which is significantly smaller than but still well approximates $A$. We are interested in minimizing the covariance error $\norm{A^TA-B^TB}_2.$We consider the problems in the streaming model, where the algorithm can only make one pass over the input with limited working space. The popular Frequent Directions algorithm of Liberty (2013) and its variants achieve optimal space-error tradeoff. However, whether the running time can be improved remains an unanswered question.In this paper, we almost settle the time complexity of this problem. In particular, we provide new space-optimal algorithms with faster running times. Moreover, we also show that the running times of our algorithms are near-optimal unless the state-of-the-art running time of matrix multiplication can be improved significantly.
Tasks
Published	2018-07-01
URL	https://icml.cc/Conferences/2018/Schedule?showEvent=2125
PDF	http://proceedings.mlr.press/v80/huang18a/huang18a.pdf
PWC	https://paperswithcode.com/paper/near-optimal-frequent-directions-for
Repo
Framework

Combining Abstractness and Language-specific Theoretical Indicators for Detecting Non-Literal Usage of Estonian Particle Verbs


Title	Combining Abstractness and Language-specific Theoretical Indicators for Detecting Non-Literal Usage of Estonian Particle Verbs
Authors	Eleri Aedmaa, Maximilian K{"o}per, Sabine Schulte im Walde
Abstract	This paper presents two novel datasets and a random-forest classifier to automatically predict literal vs. non-literal language usage for a highly frequent type of multi-word expression in a low-resource language, i.e., Estonian. We demonstrate the value of language-specific indicators induced from theoretical linguistic research, which outperform a high majority baseline when combined with language-independent features of non-literal language (such as abstractness).
Tasks
Published	2018-06-01
URL	https://www.aclweb.org/anthology/N18-4002/
PDF	https://www.aclweb.org/anthology/N18-4002
PWC	https://paperswithcode.com/paper/combining-abstractness-and-language-specific
Repo
Framework

Simulating ASR errors for training SLU systems


Title	Simulating ASR errors for training SLU systems
Authors	Edwin Simonnet, Sahar Ghannay, Nathalie Camelin, Yannick Est{`e}ve
Abstract
Tasks	Data Augmentation, Slot Filling, Speech Recognition, Spoken Language Understanding, Word Embeddings
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1499/
PDF	https://www.aclweb.org/anthology/L18-1499
PWC	https://paperswithcode.com/paper/simulating-asr-errors-for-training-slu
Repo
Framework

HandMap: Robust Hand Pose Estimation via Intermediate Dense Guidance Map Supervision


Title	HandMap: Robust Hand Pose Estimation via Intermediate Dense Guidance Map Supervision
Authors	Xiaokun Wu, Daniel Finnegan, Eamonn O’Neill, Yong-Liang Yang
Abstract	This work presents a novel hand pose estimation framework via intermediate dense guidance map supervision. By leveraging the advantage of predicting heat maps of hand joints in detection-based methods, we propose to use dense feature maps through intermediate supervision in a regression-based framework that is not limited to the resolution of the heat map. Our dense feature maps are delicately designed to encode the hand geometry and the spatial relation between local joint and global hand. The proposed framework significantly improves the state-of-the-art in both 2D and 3D on the recent benchmark datasets.
Tasks	Hand Pose Estimation, Pose Estimation
Published	2018-09-01
URL	http://openaccess.thecvf.com/content_ECCV_2018/html/Xiaokun_Wu_HandMap_Robust_Hand_ECCV_2018_paper.html
PDF	http://openaccess.thecvf.com/content_ECCV_2018/papers/Xiaokun_Wu_HandMap_Robust_Hand_ECCV_2018_paper.pdf
PWC	https://paperswithcode.com/paper/handmap-robust-hand-pose-estimation-via
Repo
Framework

Measuring Innovation in Speech and Language Processing Publications.


Title	Measuring Innovation in Speech and Language Processing Publications.
Authors	Joseph Mariani, Gil Francopoulo, Patrick Paroubek
Abstract
Tasks	Information Retrieval, Optical Character Recognition
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1297/
PDF	https://www.aclweb.org/anthology/L18-1297
PWC	https://paperswithcode.com/paper/measuring-innovation-in-speech-and-language
Repo
Framework

Parameterized Algorithms for the Matrix Completion Problem


Title	Parameterized Algorithms for the Matrix Completion Problem
Authors	Robert Ganian, Iyad Kanj, Sebastian Ordyniak, Stefan Szeider
Abstract	We consider two matrix completion problems, in which we are given a matrix with missing entries and the task is to complete the matrix in a way that (1) minimizes the rank, or (2) minimizes the number of distinct rows. We study the parameterized complexity of the two aforementioned problems with respect to several parameters of interest, including the minimum number of matrix rows, columns, and rows plus columns needed to cover all missing entries. We obtain new algorithmic results showing that, for the bounded domain case, both problems are fixed-parameter tractable with respect to all aforementioned parameters. We complement these results with a lower-bound result for the unbounded domain case that rules out fixed-parameter tractability w.r.t. some of the parameters under consideration.
Tasks	Matrix Completion
Published	2018-07-01
URL	https://icml.cc/Conferences/2018/Schedule?showEvent=2254
PDF	http://proceedings.mlr.press/v80/ganian18a/ganian18a.pdf
PWC	https://paperswithcode.com/paper/parameterized-algorithms-for-the-matrix
Repo
Framework

Graph Based Semi-Supervised Learning Approach for Tamil POS tagging


Title	Graph Based Semi-Supervised Learning Approach for Tamil POS tagging
Authors	Mokanarangan Thayaparan, Surangika Ranathunga, Uthayasanker Thayasivam
Abstract
Tasks	Graph Similarity, Information Retrieval, Metric Learning, Question Answering, Word Embeddings
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1624/
PDF	https://www.aclweb.org/anthology/L18-1624
PWC	https://paperswithcode.com/paper/graph-based-semi-supervised-learning-approach
Repo
Framework

A Robust Method for Strong Rolling Shutter Effects Correction Using Lines With Automatic Feature Selection


Title	A Robust Method for Strong Rolling Shutter Effects Correction Using Lines With Automatic Feature Selection
Authors	Yizhen Lao, Omar Ait-Aider
Abstract	We present a robust method which compensates RS distortions in a single image using a set of image curves, basing on the knowledge that they correspond to 3D straight lines. Unlike in existing work, no a priori knowledge about the line directions (e.g. Manhattan World assumption) is required. We first formulate a parametric equation for the projection of a 3D straight line viewed by a moving rolling shutter camera under a uniform motion model. Then we propose a method which efficiently estimates ego angular velocity separately from pose parameters, using at least 4 image curves. Moreover, we propose for the first time a RANSAC-like strategy to select image curves which really correspond to 3D straight lines and reject those corresponding to actual curves in 3D world. A comparative experimental study with both synthetic and real data from famous benchmarks shows that the proposed method outperforms all the existing techniques from the state-of-the-art.
Tasks	Feature Selection
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/Lao_A_Robust_Method_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/Lao_A_Robust_Method_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/a-robust-method-for-strong-rolling-shutter
Repo
Framework

Weakly Consistent Optimal Pricing Algorithms in Repeated Posted-Price Auctions with Strategic Buyer


Title	Weakly Consistent Optimal Pricing Algorithms in Repeated Posted-Price Auctions with Strategic Buyer
Authors	Alexey Drutsa
Abstract	We study revenue optimization learning algorithms for repeated posted-price auctions where a seller interacts with a single strategic buyer that holds a fixed private valuation for a good and seeks to maximize his cumulative discounted surplus. We propose a novel algorithm that never decreases offered prices and has a tight strategic regret bound of $\Theta(\log\log T)$. This result closes the open research question on the existence of a no-regret horizon-independent weakly consistent pricing. We also show that the property of non-decreasing prices is nearly necessary for a weakly consistent algorithm to be a no-regret one.
Tasks
Published	2018-07-01
URL	https://icml.cc/Conferences/2018/Schedule?showEvent=1911
PDF	http://proceedings.mlr.press/v80/drutsa18a/drutsa18a.pdf
PWC	https://paperswithcode.com/paper/weakly-consistent-optimal-pricing-algorithms
Repo
Framework

Human Appearance Transfer


Title	Human Appearance Transfer
Authors	Mihai Zanfir, Alin-Ionut Popa, Andrei Zanfir, Cristian Sminchisescu
Abstract	We propose an automatic person-to-person appearance transfer model based on explicit parametric 3d human representations and learned, constrained deep translation network architectures for photographic image synthesis. Given a single source image and a single target image, each corresponding to different human subjects, wearing different clothing and in different poses, our goal is to photo-realistically transfer the appearance from the source image onto the target image while preserving the target shape and clothing segmentation layout. Our solution to this new problem is formulated in terms of a computational pipeline that combines (1) 3d human pose and body shape estimation from monocular images, (2) identifying 3d surface colors elements (mesh triangles) visible in both images, that can be transferred directly using barycentric procedures, and (3) predicting surface appearance missing in the first image but visible in the second one using deep learning-based image synthesis techniques. Our model achieves promising results as supported by a perceptual user study where the participants rated around 65% of our results as good, very good or perfect, as well in automated tests (Inception scores and a Faster-RCNN human detector responding very similarly to real and model generated images). We further show how the proposed architecture can be profiled to automatically generate images of a person dressed with different clothing transferred from a person in another image, opening paths for applications in entertainment and photo-editing (e.g. embodying and posing as friends or famous actors), the fashion industry, or affordable online shopping of clothing.
Tasks	Image Generation
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/Zanfir_Human_Appearance_Transfer_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/Zanfir_Human_Appearance_Transfer_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/human-appearance-transfer
Repo
Framework

M3: Multimodal Memory Modelling for Video Captioning


Title	M3: Multimodal Memory Modelling for Video Captioning
Authors	Junbo Wang, Wei Wang, Yan Huang, Liang Wang, Tieniu Tan
Abstract	Video captioning which automatically translates video clips into natural language sentences is a very important task in computer vision. By virtue of recent deep learning technologies, video captioning has made great progress. However, learning an effective mapping from the visual sequence space to the language space is still a challenging problem due to the long-term multimodal dependency modelling and semantic misalignment. Inspired by the facts that memory modelling poses potential advantages to long-term sequential problems [35] and working memory is the key factor of visual attention [33], we propose a Multimodal Memory Model (M3) to describe videos, which builds a visual and textual shared memory to model the long-term visual-textual dependency and further guide visual attention on described visual targets to solve visual-textual alignments. Specifically, similar to [10], the proposed M3 attaches an external memory to store and retrieve both visual and textual contents by interacting with video and sentence with multiple read and write operations. To evaluate the proposed model, we perform experiments on two public datasets: MSVD and MSR-VTT. The experimental results demonstrate that our method outperforms most of the state-of-the-art methods in terms of BLEU and METEOR.
Tasks	Video Captioning
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/Wang_M3_Multimodal_Memory_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/Wang_M3_Multimodal_Memory_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/m3-multimodal-memory-modelling-for-video
Repo
Framework

Recurrent Tubelet Proposal and Recognition Networks for Action Detection


Title	Recurrent Tubelet Proposal and Recognition Networks for Action Detection
Authors	Dong Li, Zhaofan Qiu, Qi Dai, Ting Yao, Tao Mei
Abstract	Detecting actions in videos is a challenging task as video is an information intensive media with complex variations. Existing approaches predominantly generate action proposals for each individual frame or fixed-length clip independently, while overlooking temporal context across them. Such temporal contextual relations are vital for action detection as an action is by nature a sequence of movements. This motivates us to leverage the localized action proposals in previous frames when determining action regions in the current one. Specifically, we present a novel deep architecture called Recurrent Tubelet Proposal and Recognition (RTPR) networks to incorporate temporal context for action detection. The proposed RTPR consists of two correlated networks, i.e., Recurrent Tubelet Proposal (RTP) networks and Recurrent Tubelet Recognition (RTR) networks. The RTP initializes action proposals of the start frame through a Region Proposal Network and then estimates the movements of proposals in next frame in a recurrent manner. The action proposals of different frames are linked to form the tubelet proposals. The RTR capitalizes on a multi-channel architecture, where in each channel, a tubelet proposal is fed into a CNN plus LSTM to recurrently recognize action in the tubelet. We conduct extensive experiments on four benchmark datasets and demonstrate superior results over state-of-the-art methods. More remarkably, we obtain mAP of 98.6%, 81.3%, 77.9% and 22.3% with gains of 2.9%, 4.3%, 0.7% and 3.9% over the best competitors on UCF-Sports, J-HMDB, UCF-101 and AVA, respectively.
Tasks	Action Detection
Published	2018-09-01
URL	http://openaccess.thecvf.com/content_ECCV_2018/html/Dong_Li_Recurrent_Tubelet_Proposal_ECCV_2018_paper.html
PDF	http://openaccess.thecvf.com/content_ECCV_2018/papers/Dong_Li_Recurrent_Tubelet_Proposal_ECCV_2018_paper.pdf
PWC	https://paperswithcode.com/paper/recurrent-tubelet-proposal-and-recognition
Repo
Framework

Learning to Embed Semantic Correspondence for Natural Language Understanding


Title	Learning to Embed Semantic Correspondence for Natural Language Understanding
Authors	Sangkeun Jung, Jinsik Lee, Jiwon Kim
Abstract	While learning embedding models has yielded fruitful results in several NLP subfields, most notably Word2Vec, embedding correspondence has relatively not been well explored especially in the context of natural language understanding (NLU), a task that typically extracts structured semantic knowledge from a text. A NLU embedding model can facilitate analyzing and understanding relationships between unstructured texts and their corresponding structured semantic knowledge, essential for both researchers and practitioners of NLU. Toward this end, we propose a framework that learns to embed semantic correspondence between text and its extracted semantic knowledge, called semantic frame. One key contributed technique is semantic frame reconstruction used to derive a one-to-one mapping between embedded vectors and their corresponding semantic frames. Embedding into semantically meaningful vectors and computing their distances in vector space provides a simple, but effective way to measure semantic similarities. With the proposed framework, we demonstrate three key areas where the embedding model can be effective: visualization, semantic search and re-ranking.
Tasks	Slot Filling
Published	2018-10-01
URL	https://www.aclweb.org/anthology/K18-1013/
PDF	https://www.aclweb.org/anthology/K18-1013
PWC	https://paperswithcode.com/paper/learning-to-embed-semantic-correspondence-for
Repo
Framework

Assisted Nominalization for Academic English Writing


Title	Assisted Nominalization for Academic English Writing
Authors	John Lee, Dariush Saberi, Marvin Lam, Jonathan Webster
Abstract
Tasks	Lexical Simplification, Text Generation
Published	2018-11-01
URL	https://www.aclweb.org/anthology/W18-6706/
PDF	https://www.aclweb.org/anthology/W18-6706
PWC	https://paperswithcode.com/paper/assisted-nominalization-for-academic-english
Repo
Framework

Super-Resolving Very Low-Resolution Face Images With Supplementary Attributes


Title	Super-Resolving Very Low-Resolution Face Images With Supplementary Attributes
Authors	Xin Yu, Basura Fernando, Richard Hartley, Fatih Porikli
Abstract	Given a tiny face image, conventional face hallucination methods aim to super-resolve its high-resolution (HR) counterpart by learning a mapping from an exemplar dataset. Since a low-resolution (LR) input patch may correspond to many HR candidate patches, this ambiguity may lead to erroneous HR facial details and thus distorts final results, such as gender reversal. An LR input contains low-frequency facial components of its HR version while its residual face image defined as the difference between the HR ground-truth and interpolated LR images contains the missing high-frequency facial details. We demonstrate that supplementing residual images or feature maps with facial attribute information can significantly reduce the ambiguity in face super-resolution. To explore this idea, we develop an attribute-embedded upsampling network, which consists of an upsampling network and a discriminative network. The upsampling network is composed of an autoencoder with skip-connections, which incorporates facial attribute vectors into the residual features of LR inputs at the bottleneck of the autoencoder and deconvolutional layers used for upsampling. The discriminative network is designed to examine whether super-resolved faces contain the desired attributes or not and then its loss is used for updating the upsampling network. In this manner, we can super-resolve tiny unaligned (16$ imes$16 pixels) face images with a large upscaling factor of 8$ imes$ while reducing the uncertainty of one-to-many mappings significantly. By conducting extensive evaluations on a large-scale dataset, we demonstrate that our method achieves superior face hallucination results and outperforms the state-of-the-art.
Tasks	Face Hallucination, Super-Resolution
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/Yu_Super-Resolving_Very_Low-Resolution_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/Yu_Super-Resolving_Very_Low-Resolution_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/super-resolving-very-low-resolution-face
Repo
Framework