Paper Group AWR 42
Soft-PHOC Descriptor for End-to-End Word Spotting in Egocentric Scene Images. Deep Cascade Multi-task Learning for Slot Filling in Online Shopping Assistant. Improved Sample Complexity for Stochastic Compositional Variance Reduced Gradient. Approximate Leave-One-Out for High-Dimensional Non-Differentiable Learning Problems. Attentive Sequence to Se …
Soft-PHOC Descriptor for End-to-End Word Spotting in Egocentric Scene Images
Title | Soft-PHOC Descriptor for End-to-End Word Spotting in Egocentric Scene Images |
Authors | Dena Bazazian, Dimosthenis Karatzas, Andrew D. Bagdanov |
Abstract | Word spotting in natural scene images has many applications in scene understanding and visual assistance. In this paper we propose a technique to create and exploit an intermediate representation of images based on text attributes which are character probability maps. Our representation extends the concept of the Pyramidal Histogram Of Characters (PHOC) by exploiting Fully Convolutional Networks to derive a pixel-wise mapping of the character distribution within candidate word regions. We call this representation the Soft-PHOC. Furthermore, we show how to use Soft-PHOC descriptors for word spotting tasks in egocentric camera streams through an efficient text line proposal algorithm. This is based on the Hough Transform over character attribute maps followed by scoring using Dynamic Time Warping (DTW). We evaluate our results on ICDAR 2015 Challenge 4 dataset of incidental scene text captured by an egocentric camera. |
Tasks | Scene Understanding |
Published | 2018-09-04 |
URL | https://arxiv.org/abs/1809.00854v2 |
https://arxiv.org/pdf/1809.00854v2.pdf | |
PWC | https://paperswithcode.com/paper/soft-phoc-descriptor-for-end-to-end-word |
Repo | https://github.com/denabazazian/SoftPHOC_TextDescriptor |
Framework | tf |
Deep Cascade Multi-task Learning for Slot Filling in Online Shopping Assistant
Title | Deep Cascade Multi-task Learning for Slot Filling in Online Shopping Assistant |
Authors | Yu Gong, Xusheng Luo, Yu Zhu, Wenwu Ou, Zhao Li, Muhua Zhu, Kenny Q. Zhu, Lu Duan, Xi Chen |
Abstract | Slot filling is a critical task in natural language understanding (NLU) for dialog systems. State-of-the-art approaches treat it as a sequence labeling problem and adopt such models as BiLSTM-CRF. While these models work relatively well on standard benchmark datasets, they face challenges in the context of E-commerce where the slot labels are more informative and carry richer expressions. In this work, inspired by the unique structure of E-commerce knowledge base, we propose a novel multi-task model with cascade and residual connections, which jointly learns segment tagging, named entity tagging and slot filling. Experiments show the effectiveness of the proposed cascade and residual structures. Our model has a 14.6% advantage in F1 score over the strong baseline methods on a new Chinese E-commerce shopping assistant dataset, while achieving competitive accuracies on a standard dataset. Furthermore, online test deployed on such dominant E-commerce platform shows 130% improvement on accuracy of understanding user utterances. Our model has already gone into production in the E-commerce platform. |
Tasks | Multi-Task Learning, Slot Filling |
Published | 2018-03-30 |
URL | https://arxiv.org/abs/1803.11326v4 |
https://arxiv.org/pdf/1803.11326v4.pdf | |
PWC | https://paperswithcode.com/paper/deep-cascade-multi-task-learning-for-slot |
Repo | https://github.com/pangolulu/DCMTL |
Framework | none |
Improved Sample Complexity for Stochastic Compositional Variance Reduced Gradient
Title | Improved Sample Complexity for Stochastic Compositional Variance Reduced Gradient |
Authors | Tianyi Lin, Chenyou Fan, Mengdi Wang, Michael I. Jordan |
Abstract | Convex composition optimization is an emerging topic that covers a wide range of applications arising from stochastic optimal control, reinforcement learning and multi-stage stochastic programming. Existing algorithms suffer from unsatisfactory sample complexity and practical issues since they ignore the convexity structure in the algorithmic design. In this paper, we develop a new stochastic compositional variance-reduced gradient algorithm with the sample complexity of $O((m+n)\log(1/\epsilon)+1/\epsilon^3)$ where $m+n$ is the total number of samples. Our algorithm is near-optimal as the dependence on $m+n$ is optimal up to a logarithmic factor. Experimental results on real-world datasets demonstrate the effectiveness and efficiency of the new algorithm. |
Tasks | |
Published | 2018-06-01 |
URL | https://arxiv.org/abs/1806.00458v4 |
https://arxiv.org/pdf/1806.00458v4.pdf | |
PWC | https://paperswithcode.com/paper/improved-oracle-complexity-for-stochastic |
Repo | https://github.com/tyDLin/SCVRG |
Framework | none |
Approximate Leave-One-Out for High-Dimensional Non-Differentiable Learning Problems
Title | Approximate Leave-One-Out for High-Dimensional Non-Differentiable Learning Problems |
Authors | Shuaiwen Wang, Wenda Zhou, Arian Maleki, Haihao Lu, Vahab Mirrokni |
Abstract | Consider the following class of learning schemes: \begin{equation} \label{eq:main-problem1} \hat{\boldsymbol{\beta}} := \underset{\boldsymbol{\beta} \in \mathcal{C}}{\arg\min} ;\sum_{j=1}^n \ell(\boldsymbol{x}_j^\top\boldsymbol{\beta}; y_j) + \lambda R(\boldsymbol{\beta}), \qquad \qquad \qquad (1) \end{equation} where $\boldsymbol{x}_i \in \mathbb{R}^p$ and $y_i \in \mathbb{R}$ denote the $i^{\rm th}$ feature and response variable respectively. Let $\ell$ and $R$ be the convex loss function and regularizer, $\boldsymbol{\beta}$ denote the unknown weights, and $\lambda$ be a regularization parameter. $\mathcal{C} \subset \mathbb{R}^{p}$ is a closed convex set. Finding the optimal choice of $\lambda$ is a challenging problem in high-dimensional regimes where both $n$ and $p$ are large. We propose three frameworks to obtain a computationally efficient approximation of the leave-one-out cross validation (LOOCV) risk for nonsmooth losses and regularizers. Our three frameworks are based on the primal, dual, and proximal formulations of (1). Each framework shows its strength in certain types of problems. We prove the equivalence of the three approaches under smoothness conditions. This equivalence enables us to justify the accuracy of the three methods under such conditions. We use our approaches to obtain a risk estimate for several standard problems, including generalized LASSO, nuclear norm regularization, and support vector machines. We empirically demonstrate the effectiveness of our results for non-differentiable cases. |
Tasks | |
Published | 2018-10-04 |
URL | http://arxiv.org/abs/1810.02716v1 |
http://arxiv.org/pdf/1810.02716v1.pdf | |
PWC | https://paperswithcode.com/paper/approximate-leave-one-out-for-high |
Repo | https://github.com/wendazhou/alocv-package |
Framework | none |
Attentive Sequence to Sequence Translation for Localizing Clips of Interest by Natural Language Descriptions
Title | Attentive Sequence to Sequence Translation for Localizing Clips of Interest by Natural Language Descriptions |
Authors | Ke Ning, Linchao Zhu, Ming Cai, Yi Yang, Di Xie, Fei Wu |
Abstract | We propose a novel attentive sequence to sequence translator (ASST) for clip localization in videos by natural language descriptions. We make two contributions. First, we propose a bi-directional Recurrent Neural Network (RNN) with a finely calibrated vision-language attentive mechanism to comprehensively understand the free-formed natural language descriptions. The RNN parses natural language descriptions in two directions, and the attentive model attends every meaningful word or phrase to each frame, thereby resulting in a more detailed understanding of video content and description semantics. Second, we design a hierarchical architecture for the network to jointly model language descriptions and video content. Given a video-description pair, the network generates a matrix representation, i.e., a sequence of vectors. Each vector in the matrix represents a video frame conditioned by the description. The 2D representation not only preserves the temporal dependencies of frames but also provides an effective way to perform frame-level video-language matching. The hierarchical architecture exploits video content with multiple granularities, ranging from subtle details to global context. Integration of the multiple granularities yields a robust representation for multi-level video-language abstraction. We validate the effectiveness of our ASST on two large-scale datasets. Our ASST outperforms the state-of-the-art by $4.28%$ in Rank$@1$ on the DiDeMo dataset. On the Charades-STA dataset, we significantly improve the state-of-the-art by $13.41%$ in Rank$@1,IoU=0.5$. |
Tasks | Video Description |
Published | 2018-08-27 |
URL | http://arxiv.org/abs/1808.08803v1 |
http://arxiv.org/pdf/1808.08803v1.pdf | |
PWC | https://paperswithcode.com/paper/attentive-sequence-to-sequence-translation |
Repo | https://github.com/NeonKrypton/ASST |
Framework | none |
Learning to Cluster for Proposal-Free Instance Segmentation
Title | Learning to Cluster for Proposal-Free Instance Segmentation |
Authors | Yen-Chang Hsu, Zheng Xu, Zsolt Kira, Jiawei Huang |
Abstract | This work proposed a novel learning objective to train a deep neural network to perform end-to-end image pixel clustering. We applied the approach to instance segmentation, which is at the intersection of image semantic segmentation and object detection. We utilize the most fundamental property of instance labeling – the pairwise relationship between pixels – as the supervision to formulate the learning objective, then apply it to train a fully convolutional network (FCN) for learning to perform pixel-wise clustering. The resulting clusters can be used as the instance labeling directly. To support labeling of an unlimited number of instance, we further formulate ideas from graph coloring theory into the proposed learning objective. The evaluation on the Cityscapes dataset demonstrates strong performance and therefore proof of the concept. Moreover, our approach won the second place in the lane detection competition of 2017 CVPR Autonomous Driving Challenge, and was the top performer without using external data. |
Tasks | Autonomous Driving, Instance Segmentation, Lane Detection, Object Detection, Semantic Segmentation |
Published | 2018-03-17 |
URL | http://arxiv.org/abs/1803.06459v1 |
http://arxiv.org/pdf/1803.06459v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-cluster-for-proposal-free |
Repo | https://github.com/GT-RIPL/L2C |
Framework | pytorch |
Towards End-to-End Lane Detection: an Instance Segmentation Approach
Title | Towards End-to-End Lane Detection: an Instance Segmentation Approach |
Authors | Davy Neven, Bert De Brabandere, Stamatios Georgoulis, Marc Proesmans, Luc Van Gool |
Abstract | Modern cars are incorporating an increasing number of driver assist features, among which automatic lane keeping. The latter allows the car to properly position itself within the road lanes, which is also crucial for any subsequent lane departure or trajectory planning decision in fully autonomous cars. Traditional lane detection methods rely on a combination of highly-specialized, hand-crafted features and heuristics, usually followed by post-processing techniques, that are computationally expensive and prone to scalability due to road scene variations. More recent approaches leverage deep learning models, trained for pixel-wise lane segmentation, even when no markings are present in the image due to their big receptive field. Despite their advantages, these methods are limited to detecting a pre-defined, fixed number of lanes, e.g. ego-lanes, and can not cope with lane changes. In this paper, we go beyond the aforementioned limitations and propose to cast the lane detection problem as an instance segmentation problem - in which each lane forms its own instance - that can be trained end-to-end. To parametrize the segmented lane instances before fitting the lane, we further propose to apply a learned perspective transformation, conditioned on the image, in contrast to a fixed “bird’s-eye view” transformation. By doing so, we ensure a lane fitting which is robust against road plane changes, unlike existing approaches that rely on a fixed, pre-defined transformation. In summary, we propose a fast lane detection algorithm, running at 50 fps, which can handle a variable number of lanes and cope with lane changes. We verify our method on the tuSimple dataset and achieve competitive results. |
Tasks | Instance Segmentation, Lane Detection, Semantic Segmentation |
Published | 2018-02-15 |
URL | http://arxiv.org/abs/1802.05591v1 |
http://arxiv.org/pdf/1802.05591v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-end-to-end-lane-detection-an-instance |
Repo | https://github.com/harryhan618/LaneNet |
Framework | pytorch |
YOLOv3: An Incremental Improvement
Title | YOLOv3: An Incremental Improvement |
Authors | Joseph Redmon, Ali Farhadi |
Abstract | We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that’s pretty swell. It’s a little bigger than last time but more accurate. It’s still fast though, don’t worry. At 320x320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 mAP@50 in 51 ms on a Titan X, compared to 57.5 mAP@50 in 198 ms by RetinaNet, similar performance but 3.8x faster. As always, all the code is online at https://pjreddie.com/yolo/ |
Tasks | Object Detection, Real-Time Object Detection |
Published | 2018-04-08 |
URL | http://arxiv.org/abs/1804.02767v1 |
http://arxiv.org/pdf/1804.02767v1.pdf | |
PWC | https://paperswithcode.com/paper/yolov3-an-incremental-improvement |
Repo | https://github.com/lmeulen/PeopleCounter |
Framework | none |
Count-Based Exploration with the Successor Representation
Title | Count-Based Exploration with the Successor Representation |
Authors | Marlos C. Machado, Marc G. Bellemare, Michael Bowling |
Abstract | In this paper we introduce a simple approach for exploration in reinforcement learning (RL) that allows us to develop theoretically justified algorithms in the tabular case but that is also extendable to settings where function approximation is required. Our approach is based on the successor representation (SR), which was originally introduced as a representation defining state generalization by the similarity of successor states. Here we show that the norm of the SR, while it is being learned, can be used as a reward bonus to incentivize exploration. In order to better understand this transient behavior of the norm of the SR we introduce the substochastic successor representation (SSR) and we show that it implicitly counts the number of times each state (or feature) has been observed. We use this result to introduce an algorithm that performs as well as some theoretically sample-efficient approaches. Finally, we extend these ideas to a deep RL algorithm and show that it achieves state-of-the-art performance in Atari 2600 games when in a low sample-complexity regime. |
Tasks | Atari Games, Efficient Exploration |
Published | 2018-07-31 |
URL | https://arxiv.org/abs/1807.11622v4 |
https://arxiv.org/pdf/1807.11622v4.pdf | |
PWC | https://paperswithcode.com/paper/count-based-exploration-with-the-successor |
Repo | https://github.com/bonniesjli/DQN_SR |
Framework | pytorch |
A Generalized Circuit for the Hamiltonian Dynamics Through the Truncated Series
Title | A Generalized Circuit for the Hamiltonian Dynamics Through the Truncated Series |
Authors | Ammar Daskin, Sabre Kais |
Abstract | In this paper, we present a method for the Hamiltonian simulation in the context of eigenvalue estimation problems which improves earlier results dealing with Hamiltonian simulation through the truncated Taylor series. In particular, we present a fixed-quantum circuit design for the simulation of the Hamiltonian dynamics, $H(t)$, through the truncated Taylor series method described by Berry et al. \cite{berry2015simulating}. The circuit is general and can be used to simulate any given matrix in the phase estimation algorithm by only changing the angle values of the quantum gates implementing the time variable $t$ in the series. The circuit complexity depends on the number of summation terms composing the Hamiltonian and requires $O(Ln)$ number of quantum gates for the simulation of a molecular Hamiltonian. Here, $n$ is the number of states of a spin orbital, and $L$ is the number of terms in the molecular Hamiltonian and generally bounded by $O(n^4)$. We also discuss how to use the circuit in adaptive processes and eigenvalue related problems along with a slight modified version of the iterative phase estimation algorithm. In addition, a simple divide and conquer method is presented for mapping a matrix which are not given as sums of unitary matrices into the circuit. The complexity of the circuit is directly related to the structure of the matrix and can be bounded by $O(poly(n))$ for a matrix with $poly(n)-$sparsity. |
Tasks | |
Published | 2018-01-29 |
URL | http://arxiv.org/abs/1801.09720v3 |
http://arxiv.org/pdf/1801.09720v3.pdf | |
PWC | https://paperswithcode.com/paper/a-generalized-circuit-for-the-hamiltonian |
Repo | https://github.com/adaskin/circuitforTaylorseries |
Framework | none |
Predict then Propagate: Graph Neural Networks meet Personalized PageRank
Title | Predict then Propagate: Graph Neural Networks meet Personalized PageRank |
Authors | Johannes Klicpera, Aleksandar Bojchevski, Stephan Günnemann |
Abstract | Neural message passing algorithms for semi-supervised classification on graphs have recently achieved great success. However, for classifying a node these methods only consider nodes that are a few propagation steps away and the size of this utilized neighborhood is hard to extend. In this paper, we use the relationship between graph convolutional networks (GCN) and PageRank to derive an improved propagation scheme based on personalized PageRank. We utilize this propagation procedure to construct a simple model, personalized propagation of neural predictions (PPNP), and its fast approximation, APPNP. Our model’s training time is on par or faster and its number of parameters on par or lower than previous models. It leverages a large, adjustable neighborhood for classification and can be easily combined with any neural network. We show that this model outperforms several recently proposed methods for semi-supervised classification in the most thorough study done so far for GCN-like models. Our implementation is available online. |
Tasks | Node Classification |
Published | 2018-10-14 |
URL | http://arxiv.org/abs/1810.05997v5 |
http://arxiv.org/pdf/1810.05997v5.pdf | |
PWC | https://paperswithcode.com/paper/predict-then-propagate-graph-neural-networks |
Repo | https://github.com/klicperajo/ppnp |
Framework | pytorch |
Probabilistic Logic Programming with Beta-Distributed Random Variables
Title | Probabilistic Logic Programming with Beta-Distributed Random Variables |
Authors | Federico Cerutti, Lance Kaplan, Angelika Kimmig, Murat Sensoy |
Abstract | We enable aProbLog—a probabilistic logical programming approach—to reason in presence of uncertain probabilities represented as Beta-distributed random variables. We achieve the same performance of state-of-the-art algorithms for highly specified and engineered domains, while simultaneously we maintain the flexibility offered by aProbLog in handling complex relational domains. Our motivation is that faithfully capturing the distribution of probabilities is necessary to compute an expected utility for effective decision making under uncertainty: unfortunately, these probability distributions can be highly uncertain due to sparse data. To understand and accurately manipulate such probability distributions we need a well-defined theoretical framework that is provided by the Beta distribution, which specifies a distribution of probabilities representing all the possible values of a probability when the exact value is unknown. |
Tasks | Decision Making, Decision Making Under Uncertainty |
Published | 2018-09-20 |
URL | http://arxiv.org/abs/1809.07888v3 |
http://arxiv.org/pdf/1809.07888v3.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-logic-programming-with-beta |
Repo | https://github.com/dais-ita/SLProbLog |
Framework | none |
The 2018 PIRM Challenge on Perceptual Image Super-resolution
Title | The 2018 PIRM Challenge on Perceptual Image Super-resolution |
Authors | Yochai Blau, Roey Mechrez, Radu Timofte, Tomer Michaeli, Lihi Zelnik-Manor |
Abstract | This paper reports on the 2018 PIRM challenge on perceptual super-resolution (SR), held in conjunction with the Perceptual Image Restoration and Manipulation (PIRM) workshop at ECCV 2018. In contrast to previous SR challenges, our evaluation methodology jointly quantifies accuracy and perceptual quality, therefore enabling perceptual-driven methods to compete alongside algorithms that target PSNR maximization. Twenty-one participating teams introduced algorithms which well-improved upon the existing state-of-the-art methods in perceptual SR, as confirmed by a human opinion study. We also analyze popular image quality measures and draw conclusions regarding which of them correlates best with human opinion scores. We conclude with an analysis of the current trends in perceptual SR, as reflected from the leading submissions. |
Tasks | Image Restoration, Image Super-Resolution, Super-Resolution |
Published | 2018-09-20 |
URL | http://arxiv.org/abs/1809.07517v3 |
http://arxiv.org/pdf/1809.07517v3.pdf | |
PWC | https://paperswithcode.com/paper/the-2018-pirm-challenge-on-perceptual-image |
Repo | https://github.com/idearibosome/tf-perceptual-eusr |
Framework | tf |
RDPD: Rich Data Helps Poor Data via Imitation
Title | RDPD: Rich Data Helps Poor Data via Imitation |
Authors | Shenda Hong, Cao Xiao, Trong Nghia Hoang, Tengfei Ma, Hongyan Li, Jimeng Sun |
Abstract | In many situations, we need to build and deploy separate models in related environments with different data qualities. For example, an environment with strong observation equipments (e.g., intensive care units) often provides high-quality multi-modal data, which are acquired from multiple sensory devices and have rich-feature representations. On the other hand, an environment with poor observation equipment (e.g., at home) only provides low-quality, uni-modal data with poor-feature representations. To deploy a competitive model in a poor-data environment without requiring direct access to multi-modal data acquired from a rich-data environment, this paper develops and presents a knowledge distillation (KD) method (RDPD) to enhance a predictive model trained on poor data using knowledge distilled from a high-complexity model trained on rich, private data. We evaluated RDPD on three real-world datasets and shown that its distilled model consistently outperformed all baselines across all datasets, especially achieving the greatest performance improvement over a model trained only on low-quality data by 24.56% on PR-AUC and 12.21% on ROC-AUC, and over that of a state-of-the-art KD model by 5.91% on PR-AUC and 4.44% on ROC-AUC. |
Tasks | |
Published | 2018-09-06 |
URL | https://arxiv.org/abs/1809.01921v4 |
https://arxiv.org/pdf/1809.01921v4.pdf | |
PWC | https://paperswithcode.com/paper/rdpd-rich-data-helps-poor-data-via-imitation |
Repo | https://github.com/hsd1503/RDPD |
Framework | pytorch |
Chinese Poetry Generation with a Working Memory Model
Title | Chinese Poetry Generation with a Working Memory Model |
Authors | Xiaoyuan Yi, Maosong Sun, Ruoyu Li, Zonghan Yang |
Abstract | As an exquisite and concise literary form, poetry is a gem of human culture. Automatic poetry generation is an essential step towards computer creativity. In recent years, several neural models have been designed for this task. However, among lines of a whole poem, the coherence in meaning and topics still remains a big challenge. In this paper, inspired by the theoretical concept in cognitive psychology, we propose a novel Working Memory model for poetry generation. Different from previous methods, our model explicitly maintains topics and informative limited history in a neural memory. During the generation process, our model reads the most relevant parts from memory slots to generate the current line. After each line is generated, it writes the most salient parts of the previous line into memory slots. By dynamic manipulation of the memory, our model keeps a coherent information flow and learns to express each topic flexibly and naturally. We experiment on three different genres of Chinese poetry: quatrain, iambic and chinoiserie lyric. Both automatic and human evaluation results show that our model outperforms current state-of-the-art methods. |
Tasks | |
Published | 2018-09-12 |
URL | http://arxiv.org/abs/1809.04306v1 |
http://arxiv.org/pdf/1809.04306v1.pdf | |
PWC | https://paperswithcode.com/paper/chinese-poetry-generation-with-a-working |
Repo | https://github.com/xiaoyuanYi/WMPoetry |
Framework | tf |