October 20, 2019

3020 words 15 mins read

Paper Group AWR 264

An end-to-end TextSpotter with Explicit Alignment and Attention. Multi-focus Image Fusion using dictionary learning and Low-Rank Representation. Learning Energy Based Inpainting for Optical Flow. Neural separation of observed and unobserved distributions. Arbitrary Style Transfer with Style-Attentional Networks. Learning from Multi-domain Artistic …

An end-to-end TextSpotter with Explicit Alignment and Attention


Title	An end-to-end TextSpotter with Explicit Alignment and Attention
Authors	Tong He, Zhi Tian, Weilin Huang, Chunhua Shen, Yu Qiao, Changming Sun
Abstract	Text detection and recognition in natural images have long been considered as two separate tasks that are processed sequentially. Training of two tasks in a unified framework is non-trivial due to significant dif- ferences in optimisation difficulties. In this work, we present a conceptually simple yet efficient framework that simultaneously processes the two tasks in one shot. Our main contributions are three-fold: 1) we propose a novel text-alignment layer that allows it to precisely compute convolutional features of a text instance in ar- bitrary orientation, which is the key to boost the per- formance; 2) a character attention mechanism is introduced by using character spatial information as explicit supervision, leading to large improvements in recognition; 3) two technologies, together with a new RNN branch for word recognition, are integrated seamlessly into a single model which is end-to-end trainable. This allows the two tasks to work collaboratively by shar- ing convolutional features, which is critical to identify challenging text instances. Our model achieves impressive results in end-to-end recognition on the ICDAR2015 dataset, significantly advancing most recent results, with improvements of F-measure from (0.54, 0.51, 0.47) to (0.82, 0.77, 0.63), by using a strong, weak and generic lexicon respectively. Thanks to joint training, our method can also serve as a good detec- tor by achieving a new state-of-the-art detection performance on two datasets.
Tasks
Published	2018-03-09
URL	http://arxiv.org/abs/1803.03474v3
PDF	http://arxiv.org/pdf/1803.03474v3.pdf
PWC	https://paperswithcode.com/paper/an-end-to-end-textspotter-with-explicit
Repo	https://github.com/curbmap/curbmap-ml
Framework	tf

Multi-focus Image Fusion using dictionary learning and Low-Rank Representation


Title	Multi-focus Image Fusion using dictionary learning and Low-Rank Representation
Authors	Hui Li, Xiao-Jun Wu
Abstract	Among the representation learning, the low-rank representation (LRR) is one of the hot research topics in many fields, especially in image processing and pattern recognition. Although LRR can capture the global structure, the ability of local structure preservation is limited because LRR lacks dictionary learning. In this paper, we propose a novel multi-focus image fusion method based on dictionary learning and LRR to get a better performance in both global and local structure. Firstly, the source images are divided into several patches by sliding window technique. Then, the patches are classified according to the Histogram of Oriented Gradient (HOG) features. And the sub-dictionaries of each class are learned by K-singular value decomposition (K-SVD) algorithm. Secondly, a global dictionary is constructed by combining these sub-dictionaries. Then, we use the global dictionary in LRR to obtain the LRR coefficients vector for each patch. Finally, the l_1-norm and choose-max fuse strategy for each coefficients vector is adopted to reconstruct fused image from the fused LRR coefficients and the global dictionary. Experimental results demonstrate that the proposed method can obtain state-of-the-art performance in both qualitative and quantitative evaluations compared with serval classical methods and novel methods.The Code of our fusion method is available at https://github.com/hli1221/imagefusion_dllrr
Tasks	Dictionary Learning, Representation Learning
Published	2018-04-23
URL	http://arxiv.org/abs/1804.08355v2
PDF	http://arxiv.org/pdf/1804.08355v2.pdf
PWC	https://paperswithcode.com/paper/multi-focus-image-fusion-using-dictionary
Repo	https://github.com/hli1221/imagefusion_dllrr
Framework	none

Learning Energy Based Inpainting for Optical Flow


Title	Learning Energy Based Inpainting for Optical Flow
Authors	Christoph Vogel, Patrick Knöbelreiter, Thomas Pock
Abstract	Modern optical flow methods are often composed of a cascade of many independent steps or formulated as a black box neural network that is hard to interpret and analyze. In this work we seek for a plain, interpretable, but learnable solution. We propose a novel inpainting based algorithm that approaches the problem in three steps: feature selection and matching, selection of supporting points and energy based inpainting. To facilitate the inference we propose an optimization layer that allows to backpropagate through 10K iterations of a first-order method without any numerical or memory problems. Compared to recent state-of-the-art networks, our modular CNN is very lightweight and competitive with other, more involved, inpainting based methods.
Tasks	Feature Selection, Optical Flow Estimation
Published	2018-11-09
URL	http://arxiv.org/abs/1811.03721v1
PDF	http://arxiv.org/pdf/1811.03721v1.pdf
PWC	https://paperswithcode.com/paper/learning-energy-based-inpainting-for-optical
Repo	https://github.com/vogechri/CustomNetworkLayers
Framework	pytorch

Neural separation of observed and unobserved distributions


Title	Neural separation of observed and unobserved distributions
Authors	Tavi Halperin, Ariel Ephrat, Yedid Hoshen
Abstract	Separating mixed distributions is a long standing challenge for machine learning and signal processing. Most current methods either rely on making strong assumptions on the source distributions or rely on having training samples of each source in the mixture. In this work, we introduce a new method—Neural Egg Separation—to tackle the scenario of extracting a signal from an unobserved distribution additively mixed with a signal from an observed distribution. Our method iteratively learns to separate the known distribution from progressively finer estimates of the unknown distribution. In some settings, Neural Egg Separation is initialization sensitive, we therefore introduce Latent Mixture Masking which ensures a good initialization. Extensive experiments on audio and image separation tasks show that our method outperforms current methods that use the same level of supervision, and often achieves similar performance to full supervision.
Tasks	Speaker Separation
Published	2018-11-30
URL	https://arxiv.org/abs/1811.12739v2
PDF	https://arxiv.org/pdf/1811.12739v2.pdf
PWC	https://paperswithcode.com/paper/neural-separation-of-observed-and-unobserved
Repo	https://github.com/tavihalperin/Neural-Egg-Seperation
Framework	pytorch

Arbitrary Style Transfer with Style-Attentional Networks


Title	Arbitrary Style Transfer with Style-Attentional Networks
Authors	Dae Young Park, Kwang Hee Lee
Abstract	Arbitrary style transfer aims to synthesize a content image with the style of an image to create a third image that has never been seen before. Recent arbitrary style transfer algorithms find it challenging to balance the content structure and the style patterns. Moreover, simultaneously maintaining the global and local style patterns is difficult due to the patch-based mechanism. In this paper, we introduce a novel style-attentional network (SANet) that efficiently and flexibly integrates the local style patterns according to the semantic spatial distribution of the content image. A new identity loss function and multi-level feature embeddings enable our SANet and decoder to preserve the content structure as much as possible while enriching the style patterns. Experimental results demonstrate that our algorithm synthesizes stylized images in real-time that are higher in quality than those produced by the state-of-the-art algorithms.
Tasks	Style Transfer
Published	2018-12-06
URL	https://arxiv.org/abs/1812.02342v5
PDF	https://arxiv.org/pdf/1812.02342v5.pdf
PWC	https://paperswithcode.com/paper/arbitrary-style-transfer-with-style
Repo	https://github.com/EnchanterXiao/video-style-transfer
Framework	pytorch

Learning from Multi-domain Artistic Images for Arbitrary Style Transfer


Title	Learning from Multi-domain Artistic Images for Arbitrary Style Transfer
Authors	Zheng Xu, Michael Wilber, Chen Fang, Aaron Hertzmann, Hailin Jin
Abstract	We propose a fast feed-forward network for arbitrary style transfer, which can generate stylized image for previously unseen content and style image pairs. Besides the traditional content and style representation based on deep features and statistics for textures, we use adversarial networks to regularize the generation of stylized images. Our adversarial network learns the intrinsic property of image styles from large-scale multi-domain artistic images. The adversarial training is challenging because both the input and output of our generator are diverse multi-domain images. We use a conditional generator that stylized content by shifting the statistics of deep features, and a conditional discriminator based on the coarse category of styles. Moreover, we propose a mask module to spatially decide the stylization level and stabilize adversarial training by avoiding mode collapse. As a side effect, our trained discriminator can be applied to rank and select representative stylized images. We qualitatively and quantitatively evaluate the proposed method, and compare with recent style transfer methods. We release our code and model at https://github.com/nightldj/behance_release.
Tasks	Style Transfer
Published	2018-05-25
URL	http://arxiv.org/abs/1805.09987v2
PDF	http://arxiv.org/pdf/1805.09987v2.pdf
PWC	https://paperswithcode.com/paper/beyond-textures-learning-from-multi-domain
Repo	https://github.com/nightldj/behance_release
Framework	pytorch

Effective Representation for Easy-First Dependency Parsing


Title	Effective Representation for Easy-First Dependency Parsing
Authors	Zuchao Li, Jiaxun Cai, Hai Zhao
Abstract	Easy-first parsing relies on subtree re-ranking to build the complete parse tree. Whereas the intermediate state of parsing processing is represented by various subtrees, whose internal structural information is the key lead for later parsing action decisions, we explore a better representation for such subtrees. In detail, this work introduces a bottom-up subtree encoding method based on the child-sum tree-LSTM. Starting from an easy-first dependency parser without other handcraft features, we show that the effective subtree encoder does promote the parsing process, and can make a greedy search easy-first parser achieve promising results on benchmark treebanks compared to state-of-the-art baselines. Furthermore, with the help of the current pre-training language model, we further improve the state-of-the-art results of the easy-first approach.
Tasks	Dependency Parsing, Language Modelling
Published	2018-11-08
URL	https://arxiv.org/abs/1811.03511v3
PDF	https://arxiv.org/pdf/1811.03511v3.pdf
PWC	https://paperswithcode.com/paper/effective-subtree-encoding-for-easy-first
Repo	https://github.com/bcmi220/erefdp
Framework	none

Estimating Train Delays in a Large Rail Network Using a Zero Shot Markov Model


Title	Estimating Train Delays in a Large Rail Network Using a Zero Shot Markov Model
Authors	Ramashish Gaurav, Biplav Srivastava
Abstract	India runs the fourth largest railway transport network size carrying over 8 billion passengers per year. However, the travel experience of passengers is frequently marked by delays, i.e., late arrival of trains at stations, causing inconvenience. In a first, we study the systemic delays in train arrivals using n-order Markov frameworks and experiment with two regression based models. Using train running-status data collected for two years, we report on an efficient algorithm for estimating delays at railway stations with near accurate results. This work can help railways to manage their resources, while also helping passengers and businesses served by them to efficiently plan their activities.
Tasks
Published	2018-06-07
URL	http://arxiv.org/abs/1806.02825v1
PDF	http://arxiv.org/pdf/1806.02825v1.pdf
PWC	https://paperswithcode.com/paper/estimating-train-delays-in-a-large-rail
Repo	https://github.com/R-Gaurav/train-delay-estimation-ITSC2018
Framework	none

XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference


Title	XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference
Authors	Francesco Conti, Pasquale Davide Schiavone, Luca Benini
Abstract	Binary Neural Networks (BNNs) are promising to deliver accuracy comparable to conventional deep neural networks at a fraction of the cost in terms of memory and energy. In this paper, we introduce the XNOR Neural Engine (XNE), a fully digital configurable hardware accelerator IP for BNNs, integrated within a microcontroller unit (MCU) equipped with an autonomous I/O subsystem and hybrid SRAM / standard cell memory. The XNE is able to fully compute convolutional and dense layers in autonomy or in cooperation with the core in the MCU to realize more complex behaviors. We show post-synthesis results in 65nm and 22nm technology for the XNE IP and post-layout results in 22nm for the full MCU indicating that this system can drop the energy cost per binary operation to 21.6fJ per operation at 0.4V, and at the same time is flexible and performant enough to execute state-of-the-art BNN topologies such as ResNet-34 in less than 2.2mJ per frame at 8.9 fps.
Tasks
Published	2018-07-09
URL	http://arxiv.org/abs/1807.03010v1
PDF	http://arxiv.org/pdf/1807.03010v1.pdf
PWC	https://paperswithcode.com/paper/xnor-neural-engine-a-hardware-accelerator-ip
Repo	https://github.com/pulp-platform/hwpe-tb
Framework	none

Universal Word Segmentation: Implementation and Interpretation


Title	Universal Word Segmentation: Implementation and Interpretation
Authors	Yan Shao, Christian Hardmeier, Joakim Nivre
Abstract	Word segmentation is a low-level NLP task that is non-trivial for a considerable number of languages. In this paper, we present a sequence tagging framework and apply it to word segmentation for a wide range of languages with different writing systems and typological characteristics. Additionally, we investigate the correlations between various typological factors and word segmentation accuracy. The experimental results indicate that segmentation accuracy is positively related to word boundary markers and negatively to the number of unique non-segmental terms. Based on the analysis, we design a small set of language-specific settings and extensively evaluate the segmentation system on the Universal Dependencies datasets. Our model obtains state-of-the-art accuracies on all the UD languages. It performs substantially better on languages that are non-trivial to segment, such as Chinese, Japanese, Arabic and Hebrew, when compared to previous work.
Tasks
Published	2018-07-09
URL	http://arxiv.org/abs/1807.02974v1
PDF	http://arxiv.org/pdf/1807.02974v1.pdf
PWC	https://paperswithcode.com/paper/universal-word-segmentation-implementation
Repo	https://github.com/yanshao9798/segmenter
Framework	tf

Exact Low Tubal Rank Tensor Recovery from Gaussian Measurements


Title	Exact Low Tubal Rank Tensor Recovery from Gaussian Measurements
Authors	Canyi Lu, Jiashi Feng, Zhouchen Lin, Shuicheng Yan
Abstract	The recent proposed Tensor Nuclear Norm (TNN) [Lu et al., 2016; 2018a] is an interesting convex penalty induced by the tensor SVD [Kilmer and Martin, 2011]. It plays a similar role as the matrix nuclear norm which is the convex surrogate of the matrix rank. Considering that the TNN based Tensor Robust PCA [Lu et al., 2018a] is an elegant extension of Robust PCA with a similar tight recovery bound, it is natural to solve other low rank tensor recovery problems extended from the matrix cases. However, the extensions and proofs are generally tedious. The general atomic norm provides a unified view of low-complexity structures induced norms, e.g., the $\ell_1$-norm and nuclear norm. The sharp estimates of the required number of generic measurements for exact recovery based on the atomic norm are known in the literature. In this work, with a careful choice of the atomic set, we prove that TNN is a special atomic norm. Then by computing the Gaussian width of certain cone which is necessary for the sharp estimate, we achieve a simple bound for guaranteed low tubal rank tensor recovery from Gaussian measurements. Specifically, we show that by solving a TNN minimization problem, the underlying tensor of size $n_1\times n_2\times n_3$ with tubal rank $r$ can be exactly recovered when the given number of Gaussian measurements is $O(r(n_1+n_2-r)n_3)$. It is order optimal when comparing with the degrees of freedom $r(n_1+n_2-r)n_3$. Beyond the Gaussian mapping, we also give the recovery guarantee of tensor completion based on the uniform random mapping by TNN minimization. Numerical experiments verify our theoretical results.
Tasks
Published	2018-06-07
URL	http://arxiv.org/abs/1806.02511v1
PDF	http://arxiv.org/pdf/1806.02511v1.pdf
PWC	https://paperswithcode.com/paper/exact-low-tubal-rank-tensor-recovery-from
Repo	https://github.com/canyilu/tensor-completion-tensor-recovery
Framework	none

Tropical Geometry of Deep Neural Networks


Title	Tropical Geometry of Deep Neural Networks
Authors	Liwen Zhang, Gregory Naitzat, Lek-Heng Lim
Abstract	We establish, for the first time, connections between feedforward neural networks with ReLU activation and tropical geometry — we show that the family of such neural networks is equivalent to the family of tropical rational maps. Among other things, we deduce that feedforward ReLU neural networks with one hidden layer can be characterized by zonotopes, which serve as building blocks for deeper networks; we relate decision boundaries of such neural networks to tropical hypersurfaces, a major object of study in tropical geometry; and we prove that linear regions of such neural networks correspond to vertices of polytopes associated with tropical rational functions. An insight from our tropical formulation is that a deeper network is exponentially more expressive than a shallow network.
Tasks
Published	2018-05-18
URL	http://arxiv.org/abs/1805.07091v1
PDF	http://arxiv.org/pdf/1805.07091v1.pdf
PWC	https://paperswithcode.com/paper/tropical-geometry-of-deep-neural-networks
Repo	https://github.com/necoleman/relu_tropical_polynomial
Framework	none

Open-world Learning and Application to Product Classification


Title	Open-world Learning and Application to Product Classification
Authors	Hu Xu, Bing Liu, Lei Shu, P. Yu
Abstract	Classic supervised learning makes the closed-world assumption, meaning that classes seen in testing must have been seen in training. However, in the dynamic world, new or unseen class examples may appear constantly. A model working in such an environment must be able to reject unseen classes (not seen or used in training). If enough data is collected for the unseen classes, the system should incrementally learn to accept/classify them. This learning paradigm is called open-world learning (OWL). Existing OWL methods all need some form of re-training to accept or include the new classes in the overall model. In this paper, we propose a meta-learning approach to the problem. Its key novelty is that it only needs to train a meta-classifier, which can then continually accept new classes when they have enough labeled data for the meta-classifier to use, and also detect/reject future unseen classes. No re-training of the meta-classifier or a new overall classifier covering all old and new classes is needed. In testing, the method only uses the examples of the seen classes (including the newly added classes) on-the-fly for classification and rejection. Experimental results demonstrate the effectiveness of the new approach.
Tasks	Meta-Learning
Published	2018-09-17
URL	http://arxiv.org/abs/1809.06004v2
PDF	http://arxiv.org/pdf/1809.06004v2.pdf
PWC	https://paperswithcode.com/paper/open-world-learning-and-application-to
Repo	https://github.com/howardhsu/Meta-Open-World-Learning
Framework	tf

Understanding disentangling in $β$-VAE


Title	Understanding disentangling in $β$-VAE
Authors	Christopher P. Burgess, Irina Higgins, Arka Pal, Loic Matthey, Nick Watters, Guillaume Desjardins, Alexander Lerchner
Abstract	We present new intuitions and theoretical assessments of the emergence of disentangled representation in variational autoencoders. Taking a rate-distortion theory perspective, we show the circumstances under which representations aligned with the underlying generative factors of variation of data emerge when optimising the modified ELBO bound in $\beta$-VAE, as training progresses. From these insights, we propose a modification to the training regime of $\beta$-VAE, that progressively increases the information capacity of the latent code during training. This modification facilitates the robust learning of disentangled representations in $\beta$-VAE, without the previous trade-off in reconstruction accuracy.
Tasks
Published	2018-04-10
URL	http://arxiv.org/abs/1804.03599v1
PDF	http://arxiv.org/pdf/1804.03599v1.pdf
PWC	https://paperswithcode.com/paper/understanding-disentangling-in-vae
Repo	https://github.com/CocoJam/Beta_VAE
Framework	tf

Unsupervised Cross-dataset Person Re-identification by Transfer Learning of Spatial-Temporal Patterns


Title	Unsupervised Cross-dataset Person Re-identification by Transfer Learning of Spatial-Temporal Patterns
Authors	Jianming Lv, Weihang Chen, Qing Li, Can Yang
Abstract	Most of the proposed person re-identification algorithms conduct supervised training and testing on single labeled datasets with small size, so directly deploying these trained models to a large-scale real-world camera network may lead to poor performance due to underfitting. It is challenging to incrementally optimize the models by using the abundant unlabeled data collected from the target domain. To address this challenge, we propose an unsupervised incremental learning algorithm, TFusion, which is aided by the transfer learning of the pedestrians’ spatio-temporal patterns in the target domain. Specifically, the algorithm firstly transfers the visual classifier trained from small labeled source dataset to the unlabeled target dataset so as to learn the pedestrians’ spatial-temporal patterns. Secondly, a Bayesian fusion model is proposed to combine the learned spatio-temporal patterns with visual features to achieve a significantly improved classifier. Finally, we propose a learning-to-rank based mutual promotion procedure to incrementally optimize the classifiers based on the unlabeled data in the target domain. Comprehensive experiments based on multiple real surveillance datasets are conducted, and the results show that our algorithm gains significant improvement compared with the state-of-art cross-dataset unsupervised person re-identification algorithms.
Tasks	Learning-To-Rank, Person Re-Identification, Transfer Learning, Unsupervised Person Re-Identification
Published	2018-03-20
URL	http://arxiv.org/abs/1803.07293v1
PDF	http://arxiv.org/pdf/1803.07293v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-cross-dataset-person-re
Repo	https://github.com/ahangchen/TFusion
Framework	tf