October 21, 2019

3200 words 16 mins read

Paper Group AWR 42

Soft-PHOC Descriptor for End-to-End Word Spotting in Egocentric Scene Images. Deep Cascade Multi-task Learning for Slot Filling in Online Shopping Assistant. Improved Sample Complexity for Stochastic Compositional Variance Reduced Gradient. Approximate Leave-One-Out for High-Dimensional Non-Differentiable Learning Problems. Attentive Sequence to Se …

Soft-PHOC Descriptor for End-to-End Word Spotting in Egocentric Scene Images


Title	Soft-PHOC Descriptor for End-to-End Word Spotting in Egocentric Scene Images
Authors	Dena Bazazian, Dimosthenis Karatzas, Andrew D. Bagdanov
Abstract	Word spotting in natural scene images has many applications in scene understanding and visual assistance. In this paper we propose a technique to create and exploit an intermediate representation of images based on text attributes which are character probability maps. Our representation extends the concept of the Pyramidal Histogram Of Characters (PHOC) by exploiting Fully Convolutional Networks to derive a pixel-wise mapping of the character distribution within candidate word regions. We call this representation the Soft-PHOC. Furthermore, we show how to use Soft-PHOC descriptors for word spotting tasks in egocentric camera streams through an efficient text line proposal algorithm. This is based on the Hough Transform over character attribute maps followed by scoring using Dynamic Time Warping (DTW). We evaluate our results on ICDAR 2015 Challenge 4 dataset of incidental scene text captured by an egocentric camera.
Tasks	Scene Understanding
Published	2018-09-04
URL	https://arxiv.org/abs/1809.00854v2
PDF	https://arxiv.org/pdf/1809.00854v2.pdf
PWC	https://paperswithcode.com/paper/soft-phoc-descriptor-for-end-to-end-word
Repo	https://github.com/denabazazian/SoftPHOC_TextDescriptor
Framework	tf

Deep Cascade Multi-task Learning for Slot Filling in Online Shopping Assistant


Title	Deep Cascade Multi-task Learning for Slot Filling in Online Shopping Assistant
Authors	Yu Gong, Xusheng Luo, Yu Zhu, Wenwu Ou, Zhao Li, Muhua Zhu, Kenny Q. Zhu, Lu Duan, Xi Chen
Abstract	Slot filling is a critical task in natural language understanding (NLU) for dialog systems. State-of-the-art approaches treat it as a sequence labeling problem and adopt such models as BiLSTM-CRF. While these models work relatively well on standard benchmark datasets, they face challenges in the context of E-commerce where the slot labels are more informative and carry richer expressions. In this work, inspired by the unique structure of E-commerce knowledge base, we propose a novel multi-task model with cascade and residual connections, which jointly learns segment tagging, named entity tagging and slot filling. Experiments show the effectiveness of the proposed cascade and residual structures. Our model has a 14.6% advantage in F1 score over the strong baseline methods on a new Chinese E-commerce shopping assistant dataset, while achieving competitive accuracies on a standard dataset. Furthermore, online test deployed on such dominant E-commerce platform shows 130% improvement on accuracy of understanding user utterances. Our model has already gone into production in the E-commerce platform.
Tasks	Multi-Task Learning, Slot Filling
Published	2018-03-30
URL	https://arxiv.org/abs/1803.11326v4
PDF	https://arxiv.org/pdf/1803.11326v4.pdf
PWC	https://paperswithcode.com/paper/deep-cascade-multi-task-learning-for-slot
Repo	https://github.com/pangolulu/DCMTL
Framework	none

Improved Sample Complexity for Stochastic Compositional Variance Reduced Gradient


Title	Improved Sample Complexity for Stochastic Compositional Variance Reduced Gradient
Authors	Tianyi Lin, Chenyou Fan, Mengdi Wang, Michael I. Jordan
Abstract	Convex composition optimization is an emerging topic that covers a wide range of applications arising from stochastic optimal control, reinforcement learning and multi-stage stochastic programming. Existing algorithms suffer from unsatisfactory sample complexity and practical issues since they ignore the convexity structure in the algorithmic design. In this paper, we develop a new stochastic compositional variance-reduced gradient algorithm with the sample complexity of $O((m+n)\log(1/\epsilon)+1/\epsilon^3)$ where $m+n$ is the total number of samples. Our algorithm is near-optimal as the dependence on $m+n$ is optimal up to a logarithmic factor. Experimental results on real-world datasets demonstrate the effectiveness and efficiency of the new algorithm.
Tasks
Published	2018-06-01
URL	https://arxiv.org/abs/1806.00458v4
PDF	https://arxiv.org/pdf/1806.00458v4.pdf
PWC	https://paperswithcode.com/paper/improved-oracle-complexity-for-stochastic
Repo	https://github.com/tyDLin/SCVRG
Framework	none

Approximate Leave-One-Out for High-Dimensional Non-Differentiable Learning Problems


Title	Approximate Leave-One-Out for High-Dimensional Non-Differentiable Learning Problems
Authors	Shuaiwen Wang, Wenda Zhou, Arian Maleki, Haihao Lu, Vahab Mirrokni
Abstract	Consider the following class of learning schemes: \begin{equation} \label{eq:main-problem1} \hat{\boldsymbol{\beta}} := \underset{\boldsymbol{\beta} \in \mathcal{C}}{\arg\min} ;\sum_{j=1}^n \ell(\boldsymbol{x}_j^\top\boldsymbol{\beta}; y_j) + \lambda R(\boldsymbol{\beta}), \qquad \qquad \qquad (1) \end{equation} where $\boldsymbol{x}_i \in \mathbb{R}^p$ and $y_i \in \mathbb{R}$ denote the $i^{\rm th}$ feature and response variable respectively. Let $\ell$ and $R$ be the convex loss function and regularizer, $\boldsymbol{\beta}$ denote the unknown weights, and $\lambda$ be a regularization parameter. $\mathcal{C} \subset \mathbb{R}^{p}$ is a closed convex set. Finding the optimal choice of $\lambda$ is a challenging problem in high-dimensional regimes where both $n$ and $p$ are large. We propose three frameworks to obtain a computationally efficient approximation of the leave-one-out cross validation (LOOCV) risk for nonsmooth losses and regularizers. Our three frameworks are based on the primal, dual, and proximal formulations of (1). Each framework shows its strength in certain types of problems. We prove the equivalence of the three approaches under smoothness conditions. This equivalence enables us to justify the accuracy of the three methods under such conditions. We use our approaches to obtain a risk estimate for several standard problems, including generalized LASSO, nuclear norm regularization, and support vector machines. We empirically demonstrate the effectiveness of our results for non-differentiable cases.
Tasks
Published	2018-10-04
URL	http://arxiv.org/abs/1810.02716v1
PDF	http://arxiv.org/pdf/1810.02716v1.pdf
PWC	https://paperswithcode.com/paper/approximate-leave-one-out-for-high
Repo	https://github.com/wendazhou/alocv-package
Framework	none

Attentive Sequence to Sequence Translation for Localizing Clips of Interest by Natural Language Descriptions


Title	Attentive Sequence to Sequence Translation for Localizing Clips of Interest by Natural Language Descriptions
Authors	Ke Ning, Linchao Zhu, Ming Cai, Yi Yang, Di Xie, Fei Wu
Abstract	We propose a novel attentive sequence to sequence translator (ASST) for clip localization in videos by natural language descriptions. We make two contributions. First, we propose a bi-directional Recurrent Neural Network (RNN) with a finely calibrated vision-language attentive mechanism to comprehensively understand the free-formed natural language descriptions. The RNN parses natural language descriptions in two directions, and the attentive model attends every meaningful word or phrase to each frame, thereby resulting in a more detailed understanding of video content and description semantics. Second, we design a hierarchical architecture for the network to jointly model language descriptions and video content. Given a video-description pair, the network generates a matrix representation, i.e., a sequence of vectors. Each vector in the matrix represents a video frame conditioned by the description. The 2D representation not only preserves the temporal dependencies of frames but also provides an effective way to perform frame-level video-language matching. The hierarchical architecture exploits video content with multiple granularities, ranging from subtle details to global context. Integration of the multiple granularities yields a robust representation for multi-level video-language abstraction. We validate the effectiveness of our ASST on two large-scale datasets. Our ASST outperforms the state-of-the-art by $4.28%$ in Rank$@1$ on the DiDeMo dataset. On the Charades-STA dataset, we significantly improve the state-of-the-art by $13.41%$ in Rank$@1,IoU=0.5$.
Tasks	Video Description
Published	2018-08-27
URL	http://arxiv.org/abs/1808.08803v1
PDF	http://arxiv.org/pdf/1808.08803v1.pdf
PWC	https://paperswithcode.com/paper/attentive-sequence-to-sequence-translation
Repo	https://github.com/NeonKrypton/ASST
Framework	none

Learning to Cluster for Proposal-Free Instance Segmentation


Title	Learning to Cluster for Proposal-Free Instance Segmentation
Authors	Yen-Chang Hsu, Zheng Xu, Zsolt Kira, Jiawei Huang
Abstract	This work proposed a novel learning objective to train a deep neural network to perform end-to-end image pixel clustering. We applied the approach to instance segmentation, which is at the intersection of image semantic segmentation and object detection. We utilize the most fundamental property of instance labeling – the pairwise relationship between pixels – as the supervision to formulate the learning objective, then apply it to train a fully convolutional network (FCN) for learning to perform pixel-wise clustering. The resulting clusters can be used as the instance labeling directly. To support labeling of an unlimited number of instance, we further formulate ideas from graph coloring theory into the proposed learning objective. The evaluation on the Cityscapes dataset demonstrates strong performance and therefore proof of the concept. Moreover, our approach won the second place in the lane detection competition of 2017 CVPR Autonomous Driving Challenge, and was the top performer without using external data.
Tasks	Autonomous Driving, Instance Segmentation, Lane Detection, Object Detection, Semantic Segmentation
Published	2018-03-17
URL	http://arxiv.org/abs/1803.06459v1
PDF	http://arxiv.org/pdf/1803.06459v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-cluster-for-proposal-free
Repo	https://github.com/GT-RIPL/L2C
Framework	pytorch

Towards End-to-End Lane Detection: an Instance Segmentation Approach


Title	Towards End-to-End Lane Detection: an Instance Segmentation Approach
Authors	Davy Neven, Bert De Brabandere, Stamatios Georgoulis, Marc Proesmans, Luc Van Gool
Abstract	Modern cars are incorporating an increasing number of driver assist features, among which automatic lane keeping. The latter allows the car to properly position itself within the road lanes, which is also crucial for any subsequent lane departure or trajectory planning decision in fully autonomous cars. Traditional lane detection methods rely on a combination of highly-specialized, hand-crafted features and heuristics, usually followed by post-processing techniques, that are computationally expensive and prone to scalability due to road scene variations. More recent approaches leverage deep learning models, trained for pixel-wise lane segmentation, even when no markings are present in the image due to their big receptive field. Despite their advantages, these methods are limited to detecting a pre-defined, fixed number of lanes, e.g. ego-lanes, and can not cope with lane changes. In this paper, we go beyond the aforementioned limitations and propose to cast the lane detection problem as an instance segmentation problem - in which each lane forms its own instance - that can be trained end-to-end. To parametrize the segmented lane instances before fitting the lane, we further propose to apply a learned perspective transformation, conditioned on the image, in contrast to a fixed “bird’s-eye view” transformation. By doing so, we ensure a lane fitting which is robust against road plane changes, unlike existing approaches that rely on a fixed, pre-defined transformation. In summary, we propose a fast lane detection algorithm, running at 50 fps, which can handle a variable number of lanes and cope with lane changes. We verify our method on the tuSimple dataset and achieve competitive results.
Tasks	Instance Segmentation, Lane Detection, Semantic Segmentation
Published	2018-02-15
URL	http://arxiv.org/abs/1802.05591v1
PDF	http://arxiv.org/pdf/1802.05591v1.pdf
PWC	https://paperswithcode.com/paper/towards-end-to-end-lane-detection-an-instance
Repo	https://github.com/harryhan618/LaneNet
Framework	pytorch

YOLOv3: An Incremental Improvement


Title	YOLOv3: An Incremental Improvement
Authors	Joseph Redmon, Ali Farhadi
Abstract	We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that’s pretty swell. It’s a little bigger than last time but more accurate. It’s still fast though, don’t worry. At 320x320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 mAP@50 in 51 ms on a Titan X, compared to 57.5 mAP@50 in 198 ms by RetinaNet, similar performance but 3.8x faster. As always, all the code is online at https://pjreddie.com/yolo/
Tasks	Object Detection, Real-Time Object Detection
Published	2018-04-08
URL	http://arxiv.org/abs/1804.02767v1
PDF	http://arxiv.org/pdf/1804.02767v1.pdf
PWC	https://paperswithcode.com/paper/yolov3-an-incremental-improvement
Repo	https://github.com/lmeulen/PeopleCounter
Framework	none

Count-Based Exploration with the Successor Representation


Title	Count-Based Exploration with the Successor Representation
Authors	Marlos C. Machado, Marc G. Bellemare, Michael Bowling
Abstract	In this paper we introduce a simple approach for exploration in reinforcement learning (RL) that allows us to develop theoretically justified algorithms in the tabular case but that is also extendable to settings where function approximation is required. Our approach is based on the successor representation (SR), which was originally introduced as a representation defining state generalization by the similarity of successor states. Here we show that the norm of the SR, while it is being learned, can be used as a reward bonus to incentivize exploration. In order to better understand this transient behavior of the norm of the SR we introduce the substochastic successor representation (SSR) and we show that it implicitly counts the number of times each state (or feature) has been observed. We use this result to introduce an algorithm that performs as well as some theoretically sample-efficient approaches. Finally, we extend these ideas to a deep RL algorithm and show that it achieves state-of-the-art performance in Atari 2600 games when in a low sample-complexity regime.
Tasks	Atari Games, Efficient Exploration
Published	2018-07-31
URL	https://arxiv.org/abs/1807.11622v4
PDF	https://arxiv.org/pdf/1807.11622v4.pdf
PWC	https://paperswithcode.com/paper/count-based-exploration-with-the-successor
Repo	https://github.com/bonniesjli/DQN_SR
Framework	pytorch

A Generalized Circuit for the Hamiltonian Dynamics Through the Truncated Series


Title	A Generalized Circuit for the Hamiltonian Dynamics Through the Truncated Series
Authors	Ammar Daskin, Sabre Kais
Abstract	In this paper, we present a method for the Hamiltonian simulation in the context of eigenvalue estimation problems which improves earlier results dealing with Hamiltonian simulation through the truncated Taylor series. In particular, we present a fixed-quantum circuit design for the simulation of the Hamiltonian dynamics, $H(t)$, through the truncated Taylor series method described by Berry et al. \cite{berry2015simulating}. The circuit is general and can be used to simulate any given matrix in the phase estimation algorithm by only changing the angle values of the quantum gates implementing the time variable $t$ in the series. The circuit complexity depends on the number of summation terms composing the Hamiltonian and requires $O(Ln)$ number of quantum gates for the simulation of a molecular Hamiltonian. Here, $n$ is the number of states of a spin orbital, and $L$ is the number of terms in the molecular Hamiltonian and generally bounded by $O(n^4)$. We also discuss how to use the circuit in adaptive processes and eigenvalue related problems along with a slight modified version of the iterative phase estimation algorithm. In addition, a simple divide and conquer method is presented for mapping a matrix which are not given as sums of unitary matrices into the circuit. The complexity of the circuit is directly related to the structure of the matrix and can be bounded by $O(poly(n))$ for a matrix with $poly(n)-$sparsity.
Tasks
Published	2018-01-29
URL	http://arxiv.org/abs/1801.09720v3
PDF	http://arxiv.org/pdf/1801.09720v3.pdf
PWC	https://paperswithcode.com/paper/a-generalized-circuit-for-the-hamiltonian
Repo	https://github.com/adaskin/circuitforTaylorseries
Framework	none

Predict then Propagate: Graph Neural Networks meet Personalized PageRank


Title	Predict then Propagate: Graph Neural Networks meet Personalized PageRank
Authors	Johannes Klicpera, Aleksandar Bojchevski, Stephan Günnemann
Abstract	Neural message passing algorithms for semi-supervised classification on graphs have recently achieved great success. However, for classifying a node these methods only consider nodes that are a few propagation steps away and the size of this utilized neighborhood is hard to extend. In this paper, we use the relationship between graph convolutional networks (GCN) and PageRank to derive an improved propagation scheme based on personalized PageRank. We utilize this propagation procedure to construct a simple model, personalized propagation of neural predictions (PPNP), and its fast approximation, APPNP. Our model’s training time is on par or faster and its number of parameters on par or lower than previous models. It leverages a large, adjustable neighborhood for classification and can be easily combined with any neural network. We show that this model outperforms several recently proposed methods for semi-supervised classification in the most thorough study done so far for GCN-like models. Our implementation is available online.
Tasks	Node Classification
Published	2018-10-14
URL	http://arxiv.org/abs/1810.05997v5
PDF	http://arxiv.org/pdf/1810.05997v5.pdf
PWC	https://paperswithcode.com/paper/predict-then-propagate-graph-neural-networks
Repo	https://github.com/klicperajo/ppnp
Framework	pytorch

Probabilistic Logic Programming with Beta-Distributed Random Variables


Title	Probabilistic Logic Programming with Beta-Distributed Random Variables
Authors	Federico Cerutti, Lance Kaplan, Angelika Kimmig, Murat Sensoy
Abstract	We enable aProbLog—a probabilistic logical programming approach—to reason in presence of uncertain probabilities represented as Beta-distributed random variables. We achieve the same performance of state-of-the-art algorithms for highly specified and engineered domains, while simultaneously we maintain the flexibility offered by aProbLog in handling complex relational domains. Our motivation is that faithfully capturing the distribution of probabilities is necessary to compute an expected utility for effective decision making under uncertainty: unfortunately, these probability distributions can be highly uncertain due to sparse data. To understand and accurately manipulate such probability distributions we need a well-defined theoretical framework that is provided by the Beta distribution, which specifies a distribution of probabilities representing all the possible values of a probability when the exact value is unknown.
Tasks	Decision Making, Decision Making Under Uncertainty
Published	2018-09-20
URL	http://arxiv.org/abs/1809.07888v3
PDF	http://arxiv.org/pdf/1809.07888v3.pdf
PWC	https://paperswithcode.com/paper/probabilistic-logic-programming-with-beta
Repo	https://github.com/dais-ita/SLProbLog
Framework	none

The 2018 PIRM Challenge on Perceptual Image Super-resolution


Title	The 2018 PIRM Challenge on Perceptual Image Super-resolution
Authors	Yochai Blau, Roey Mechrez, Radu Timofte, Tomer Michaeli, Lihi Zelnik-Manor
Abstract	This paper reports on the 2018 PIRM challenge on perceptual super-resolution (SR), held in conjunction with the Perceptual Image Restoration and Manipulation (PIRM) workshop at ECCV 2018. In contrast to previous SR challenges, our evaluation methodology jointly quantifies accuracy and perceptual quality, therefore enabling perceptual-driven methods to compete alongside algorithms that target PSNR maximization. Twenty-one participating teams introduced algorithms which well-improved upon the existing state-of-the-art methods in perceptual SR, as confirmed by a human opinion study. We also analyze popular image quality measures and draw conclusions regarding which of them correlates best with human opinion scores. We conclude with an analysis of the current trends in perceptual SR, as reflected from the leading submissions.
Tasks	Image Restoration, Image Super-Resolution, Super-Resolution
Published	2018-09-20
URL	http://arxiv.org/abs/1809.07517v3
PDF	http://arxiv.org/pdf/1809.07517v3.pdf
PWC	https://paperswithcode.com/paper/the-2018-pirm-challenge-on-perceptual-image
Repo	https://github.com/idearibosome/tf-perceptual-eusr
Framework	tf

RDPD: Rich Data Helps Poor Data via Imitation


Title	RDPD: Rich Data Helps Poor Data via Imitation
Authors	Shenda Hong, Cao Xiao, Trong Nghia Hoang, Tengfei Ma, Hongyan Li, Jimeng Sun
Abstract	In many situations, we need to build and deploy separate models in related environments with different data qualities. For example, an environment with strong observation equipments (e.g., intensive care units) often provides high-quality multi-modal data, which are acquired from multiple sensory devices and have rich-feature representations. On the other hand, an environment with poor observation equipment (e.g., at home) only provides low-quality, uni-modal data with poor-feature representations. To deploy a competitive model in a poor-data environment without requiring direct access to multi-modal data acquired from a rich-data environment, this paper develops and presents a knowledge distillation (KD) method (RDPD) to enhance a predictive model trained on poor data using knowledge distilled from a high-complexity model trained on rich, private data. We evaluated RDPD on three real-world datasets and shown that its distilled model consistently outperformed all baselines across all datasets, especially achieving the greatest performance improvement over a model trained only on low-quality data by 24.56% on PR-AUC and 12.21% on ROC-AUC, and over that of a state-of-the-art KD model by 5.91% on PR-AUC and 4.44% on ROC-AUC.
Tasks
Published	2018-09-06
URL	https://arxiv.org/abs/1809.01921v4
PDF	https://arxiv.org/pdf/1809.01921v4.pdf
PWC	https://paperswithcode.com/paper/rdpd-rich-data-helps-poor-data-via-imitation
Repo	https://github.com/hsd1503/RDPD
Framework	pytorch

Chinese Poetry Generation with a Working Memory Model


Title	Chinese Poetry Generation with a Working Memory Model
Authors	Xiaoyuan Yi, Maosong Sun, Ruoyu Li, Zonghan Yang
Abstract	As an exquisite and concise literary form, poetry is a gem of human culture. Automatic poetry generation is an essential step towards computer creativity. In recent years, several neural models have been designed for this task. However, among lines of a whole poem, the coherence in meaning and topics still remains a big challenge. In this paper, inspired by the theoretical concept in cognitive psychology, we propose a novel Working Memory model for poetry generation. Different from previous methods, our model explicitly maintains topics and informative limited history in a neural memory. During the generation process, our model reads the most relevant parts from memory slots to generate the current line. After each line is generated, it writes the most salient parts of the previous line into memory slots. By dynamic manipulation of the memory, our model keeps a coherent information flow and learns to express each topic flexibly and naturally. We experiment on three different genres of Chinese poetry: quatrain, iambic and chinoiserie lyric. Both automatic and human evaluation results show that our model outperforms current state-of-the-art methods.
Tasks
Published	2018-09-12
URL	http://arxiv.org/abs/1809.04306v1
PDF	http://arxiv.org/pdf/1809.04306v1.pdf
PWC	https://paperswithcode.com/paper/chinese-poetry-generation-with-a-working
Repo	https://github.com/xiaoyuanYi/WMPoetry
Framework	tf