Paper Group AWR 264
An end-to-end TextSpotter with Explicit Alignment and Attention. Multi-focus Image Fusion using dictionary learning and Low-Rank Representation. Learning Energy Based Inpainting for Optical Flow. Neural separation of observed and unobserved distributions. Arbitrary Style Transfer with Style-Attentional Networks. Learning from Multi-domain Artistic …
An end-to-end TextSpotter with Explicit Alignment and Attention
Title | An end-to-end TextSpotter with Explicit Alignment and Attention |
Authors | Tong He, Zhi Tian, Weilin Huang, Chunhua Shen, Yu Qiao, Changming Sun |
Abstract | Text detection and recognition in natural images have long been considered as two separate tasks that are processed sequentially. Training of two tasks in a unified framework is non-trivial due to significant dif- ferences in optimisation difficulties. In this work, we present a conceptually simple yet efficient framework that simultaneously processes the two tasks in one shot. Our main contributions are three-fold: 1) we propose a novel text-alignment layer that allows it to precisely compute convolutional features of a text instance in ar- bitrary orientation, which is the key to boost the per- formance; 2) a character attention mechanism is introduced by using character spatial information as explicit supervision, leading to large improvements in recognition; 3) two technologies, together with a new RNN branch for word recognition, are integrated seamlessly into a single model which is end-to-end trainable. This allows the two tasks to work collaboratively by shar- ing convolutional features, which is critical to identify challenging text instances. Our model achieves impressive results in end-to-end recognition on the ICDAR2015 dataset, significantly advancing most recent results, with improvements of F-measure from (0.54, 0.51, 0.47) to (0.82, 0.77, 0.63), by using a strong, weak and generic lexicon respectively. Thanks to joint training, our method can also serve as a good detec- tor by achieving a new state-of-the-art detection performance on two datasets. |
Tasks | |
Published | 2018-03-09 |
URL | http://arxiv.org/abs/1803.03474v3 |
http://arxiv.org/pdf/1803.03474v3.pdf | |
PWC | https://paperswithcode.com/paper/an-end-to-end-textspotter-with-explicit |
Repo | https://github.com/curbmap/curbmap-ml |
Framework | tf |
Multi-focus Image Fusion using dictionary learning and Low-Rank Representation
Title | Multi-focus Image Fusion using dictionary learning and Low-Rank Representation |
Authors | Hui Li, Xiao-Jun Wu |
Abstract | Among the representation learning, the low-rank representation (LRR) is one of the hot research topics in many fields, especially in image processing and pattern recognition. Although LRR can capture the global structure, the ability of local structure preservation is limited because LRR lacks dictionary learning. In this paper, we propose a novel multi-focus image fusion method based on dictionary learning and LRR to get a better performance in both global and local structure. Firstly, the source images are divided into several patches by sliding window technique. Then, the patches are classified according to the Histogram of Oriented Gradient (HOG) features. And the sub-dictionaries of each class are learned by K-singular value decomposition (K-SVD) algorithm. Secondly, a global dictionary is constructed by combining these sub-dictionaries. Then, we use the global dictionary in LRR to obtain the LRR coefficients vector for each patch. Finally, the l_1-norm and choose-max fuse strategy for each coefficients vector is adopted to reconstruct fused image from the fused LRR coefficients and the global dictionary. Experimental results demonstrate that the proposed method can obtain state-of-the-art performance in both qualitative and quantitative evaluations compared with serval classical methods and novel methods.The Code of our fusion method is available at https://github.com/hli1221/imagefusion_dllrr |
Tasks | Dictionary Learning, Representation Learning |
Published | 2018-04-23 |
URL | http://arxiv.org/abs/1804.08355v2 |
http://arxiv.org/pdf/1804.08355v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-focus-image-fusion-using-dictionary |
Repo | https://github.com/hli1221/imagefusion_dllrr |
Framework | none |
Learning Energy Based Inpainting for Optical Flow
Title | Learning Energy Based Inpainting for Optical Flow |
Authors | Christoph Vogel, Patrick Knöbelreiter, Thomas Pock |
Abstract | Modern optical flow methods are often composed of a cascade of many independent steps or formulated as a black box neural network that is hard to interpret and analyze. In this work we seek for a plain, interpretable, but learnable solution. We propose a novel inpainting based algorithm that approaches the problem in three steps: feature selection and matching, selection of supporting points and energy based inpainting. To facilitate the inference we propose an optimization layer that allows to backpropagate through 10K iterations of a first-order method without any numerical or memory problems. Compared to recent state-of-the-art networks, our modular CNN is very lightweight and competitive with other, more involved, inpainting based methods. |
Tasks | Feature Selection, Optical Flow Estimation |
Published | 2018-11-09 |
URL | http://arxiv.org/abs/1811.03721v1 |
http://arxiv.org/pdf/1811.03721v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-energy-based-inpainting-for-optical |
Repo | https://github.com/vogechri/CustomNetworkLayers |
Framework | pytorch |
Neural separation of observed and unobserved distributions
Title | Neural separation of observed and unobserved distributions |
Authors | Tavi Halperin, Ariel Ephrat, Yedid Hoshen |
Abstract | Separating mixed distributions is a long standing challenge for machine learning and signal processing. Most current methods either rely on making strong assumptions on the source distributions or rely on having training samples of each source in the mixture. In this work, we introduce a new method—Neural Egg Separation—to tackle the scenario of extracting a signal from an unobserved distribution additively mixed with a signal from an observed distribution. Our method iteratively learns to separate the known distribution from progressively finer estimates of the unknown distribution. In some settings, Neural Egg Separation is initialization sensitive, we therefore introduce Latent Mixture Masking which ensures a good initialization. Extensive experiments on audio and image separation tasks show that our method outperforms current methods that use the same level of supervision, and often achieves similar performance to full supervision. |
Tasks | Speaker Separation |
Published | 2018-11-30 |
URL | https://arxiv.org/abs/1811.12739v2 |
https://arxiv.org/pdf/1811.12739v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-separation-of-observed-and-unobserved |
Repo | https://github.com/tavihalperin/Neural-Egg-Seperation |
Framework | pytorch |
Arbitrary Style Transfer with Style-Attentional Networks
Title | Arbitrary Style Transfer with Style-Attentional Networks |
Authors | Dae Young Park, Kwang Hee Lee |
Abstract | Arbitrary style transfer aims to synthesize a content image with the style of an image to create a third image that has never been seen before. Recent arbitrary style transfer algorithms find it challenging to balance the content structure and the style patterns. Moreover, simultaneously maintaining the global and local style patterns is difficult due to the patch-based mechanism. In this paper, we introduce a novel style-attentional network (SANet) that efficiently and flexibly integrates the local style patterns according to the semantic spatial distribution of the content image. A new identity loss function and multi-level feature embeddings enable our SANet and decoder to preserve the content structure as much as possible while enriching the style patterns. Experimental results demonstrate that our algorithm synthesizes stylized images in real-time that are higher in quality than those produced by the state-of-the-art algorithms. |
Tasks | Style Transfer |
Published | 2018-12-06 |
URL | https://arxiv.org/abs/1812.02342v5 |
https://arxiv.org/pdf/1812.02342v5.pdf | |
PWC | https://paperswithcode.com/paper/arbitrary-style-transfer-with-style |
Repo | https://github.com/EnchanterXiao/video-style-transfer |
Framework | pytorch |
Learning from Multi-domain Artistic Images for Arbitrary Style Transfer
Title | Learning from Multi-domain Artistic Images for Arbitrary Style Transfer |
Authors | Zheng Xu, Michael Wilber, Chen Fang, Aaron Hertzmann, Hailin Jin |
Abstract | We propose a fast feed-forward network for arbitrary style transfer, which can generate stylized image for previously unseen content and style image pairs. Besides the traditional content and style representation based on deep features and statistics for textures, we use adversarial networks to regularize the generation of stylized images. Our adversarial network learns the intrinsic property of image styles from large-scale multi-domain artistic images. The adversarial training is challenging because both the input and output of our generator are diverse multi-domain images. We use a conditional generator that stylized content by shifting the statistics of deep features, and a conditional discriminator based on the coarse category of styles. Moreover, we propose a mask module to spatially decide the stylization level and stabilize adversarial training by avoiding mode collapse. As a side effect, our trained discriminator can be applied to rank and select representative stylized images. We qualitatively and quantitatively evaluate the proposed method, and compare with recent style transfer methods. We release our code and model at https://github.com/nightldj/behance_release. |
Tasks | Style Transfer |
Published | 2018-05-25 |
URL | http://arxiv.org/abs/1805.09987v2 |
http://arxiv.org/pdf/1805.09987v2.pdf | |
PWC | https://paperswithcode.com/paper/beyond-textures-learning-from-multi-domain |
Repo | https://github.com/nightldj/behance_release |
Framework | pytorch |
Effective Representation for Easy-First Dependency Parsing
Title | Effective Representation for Easy-First Dependency Parsing |
Authors | Zuchao Li, Jiaxun Cai, Hai Zhao |
Abstract | Easy-first parsing relies on subtree re-ranking to build the complete parse tree. Whereas the intermediate state of parsing processing is represented by various subtrees, whose internal structural information is the key lead for later parsing action decisions, we explore a better representation for such subtrees. In detail, this work introduces a bottom-up subtree encoding method based on the child-sum tree-LSTM. Starting from an easy-first dependency parser without other handcraft features, we show that the effective subtree encoder does promote the parsing process, and can make a greedy search easy-first parser achieve promising results on benchmark treebanks compared to state-of-the-art baselines. Furthermore, with the help of the current pre-training language model, we further improve the state-of-the-art results of the easy-first approach. |
Tasks | Dependency Parsing, Language Modelling |
Published | 2018-11-08 |
URL | https://arxiv.org/abs/1811.03511v3 |
https://arxiv.org/pdf/1811.03511v3.pdf | |
PWC | https://paperswithcode.com/paper/effective-subtree-encoding-for-easy-first |
Repo | https://github.com/bcmi220/erefdp |
Framework | none |
Estimating Train Delays in a Large Rail Network Using a Zero Shot Markov Model
Title | Estimating Train Delays in a Large Rail Network Using a Zero Shot Markov Model |
Authors | Ramashish Gaurav, Biplav Srivastava |
Abstract | India runs the fourth largest railway transport network size carrying over 8 billion passengers per year. However, the travel experience of passengers is frequently marked by delays, i.e., late arrival of trains at stations, causing inconvenience. In a first, we study the systemic delays in train arrivals using n-order Markov frameworks and experiment with two regression based models. Using train running-status data collected for two years, we report on an efficient algorithm for estimating delays at railway stations with near accurate results. This work can help railways to manage their resources, while also helping passengers and businesses served by them to efficiently plan their activities. |
Tasks | |
Published | 2018-06-07 |
URL | http://arxiv.org/abs/1806.02825v1 |
http://arxiv.org/pdf/1806.02825v1.pdf | |
PWC | https://paperswithcode.com/paper/estimating-train-delays-in-a-large-rail |
Repo | https://github.com/R-Gaurav/train-delay-estimation-ITSC2018 |
Framework | none |
XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference
Title | XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference |
Authors | Francesco Conti, Pasquale Davide Schiavone, Luca Benini |
Abstract | Binary Neural Networks (BNNs) are promising to deliver accuracy comparable to conventional deep neural networks at a fraction of the cost in terms of memory and energy. In this paper, we introduce the XNOR Neural Engine (XNE), a fully digital configurable hardware accelerator IP for BNNs, integrated within a microcontroller unit (MCU) equipped with an autonomous I/O subsystem and hybrid SRAM / standard cell memory. The XNE is able to fully compute convolutional and dense layers in autonomy or in cooperation with the core in the MCU to realize more complex behaviors. We show post-synthesis results in 65nm and 22nm technology for the XNE IP and post-layout results in 22nm for the full MCU indicating that this system can drop the energy cost per binary operation to 21.6fJ per operation at 0.4V, and at the same time is flexible and performant enough to execute state-of-the-art BNN topologies such as ResNet-34 in less than 2.2mJ per frame at 8.9 fps. |
Tasks | |
Published | 2018-07-09 |
URL | http://arxiv.org/abs/1807.03010v1 |
http://arxiv.org/pdf/1807.03010v1.pdf | |
PWC | https://paperswithcode.com/paper/xnor-neural-engine-a-hardware-accelerator-ip |
Repo | https://github.com/pulp-platform/hwpe-tb |
Framework | none |
Universal Word Segmentation: Implementation and Interpretation
Title | Universal Word Segmentation: Implementation and Interpretation |
Authors | Yan Shao, Christian Hardmeier, Joakim Nivre |
Abstract | Word segmentation is a low-level NLP task that is non-trivial for a considerable number of languages. In this paper, we present a sequence tagging framework and apply it to word segmentation for a wide range of languages with different writing systems and typological characteristics. Additionally, we investigate the correlations between various typological factors and word segmentation accuracy. The experimental results indicate that segmentation accuracy is positively related to word boundary markers and negatively to the number of unique non-segmental terms. Based on the analysis, we design a small set of language-specific settings and extensively evaluate the segmentation system on the Universal Dependencies datasets. Our model obtains state-of-the-art accuracies on all the UD languages. It performs substantially better on languages that are non-trivial to segment, such as Chinese, Japanese, Arabic and Hebrew, when compared to previous work. |
Tasks | |
Published | 2018-07-09 |
URL | http://arxiv.org/abs/1807.02974v1 |
http://arxiv.org/pdf/1807.02974v1.pdf | |
PWC | https://paperswithcode.com/paper/universal-word-segmentation-implementation |
Repo | https://github.com/yanshao9798/segmenter |
Framework | tf |
Exact Low Tubal Rank Tensor Recovery from Gaussian Measurements
Title | Exact Low Tubal Rank Tensor Recovery from Gaussian Measurements |
Authors | Canyi Lu, Jiashi Feng, Zhouchen Lin, Shuicheng Yan |
Abstract | The recent proposed Tensor Nuclear Norm (TNN) [Lu et al., 2016; 2018a] is an interesting convex penalty induced by the tensor SVD [Kilmer and Martin, 2011]. It plays a similar role as the matrix nuclear norm which is the convex surrogate of the matrix rank. Considering that the TNN based Tensor Robust PCA [Lu et al., 2018a] is an elegant extension of Robust PCA with a similar tight recovery bound, it is natural to solve other low rank tensor recovery problems extended from the matrix cases. However, the extensions and proofs are generally tedious. The general atomic norm provides a unified view of low-complexity structures induced norms, e.g., the $\ell_1$-norm and nuclear norm. The sharp estimates of the required number of generic measurements for exact recovery based on the atomic norm are known in the literature. In this work, with a careful choice of the atomic set, we prove that TNN is a special atomic norm. Then by computing the Gaussian width of certain cone which is necessary for the sharp estimate, we achieve a simple bound for guaranteed low tubal rank tensor recovery from Gaussian measurements. Specifically, we show that by solving a TNN minimization problem, the underlying tensor of size $n_1\times n_2\times n_3$ with tubal rank $r$ can be exactly recovered when the given number of Gaussian measurements is $O(r(n_1+n_2-r)n_3)$. It is order optimal when comparing with the degrees of freedom $r(n_1+n_2-r)n_3$. Beyond the Gaussian mapping, we also give the recovery guarantee of tensor completion based on the uniform random mapping by TNN minimization. Numerical experiments verify our theoretical results. |
Tasks | |
Published | 2018-06-07 |
URL | http://arxiv.org/abs/1806.02511v1 |
http://arxiv.org/pdf/1806.02511v1.pdf | |
PWC | https://paperswithcode.com/paper/exact-low-tubal-rank-tensor-recovery-from |
Repo | https://github.com/canyilu/tensor-completion-tensor-recovery |
Framework | none |
Tropical Geometry of Deep Neural Networks
Title | Tropical Geometry of Deep Neural Networks |
Authors | Liwen Zhang, Gregory Naitzat, Lek-Heng Lim |
Abstract | We establish, for the first time, connections between feedforward neural networks with ReLU activation and tropical geometry — we show that the family of such neural networks is equivalent to the family of tropical rational maps. Among other things, we deduce that feedforward ReLU neural networks with one hidden layer can be characterized by zonotopes, which serve as building blocks for deeper networks; we relate decision boundaries of such neural networks to tropical hypersurfaces, a major object of study in tropical geometry; and we prove that linear regions of such neural networks correspond to vertices of polytopes associated with tropical rational functions. An insight from our tropical formulation is that a deeper network is exponentially more expressive than a shallow network. |
Tasks | |
Published | 2018-05-18 |
URL | http://arxiv.org/abs/1805.07091v1 |
http://arxiv.org/pdf/1805.07091v1.pdf | |
PWC | https://paperswithcode.com/paper/tropical-geometry-of-deep-neural-networks |
Repo | https://github.com/necoleman/relu_tropical_polynomial |
Framework | none |
Open-world Learning and Application to Product Classification
Title | Open-world Learning and Application to Product Classification |
Authors | Hu Xu, Bing Liu, Lei Shu, P. Yu |
Abstract | Classic supervised learning makes the closed-world assumption, meaning that classes seen in testing must have been seen in training. However, in the dynamic world, new or unseen class examples may appear constantly. A model working in such an environment must be able to reject unseen classes (not seen or used in training). If enough data is collected for the unseen classes, the system should incrementally learn to accept/classify them. This learning paradigm is called open-world learning (OWL). Existing OWL methods all need some form of re-training to accept or include the new classes in the overall model. In this paper, we propose a meta-learning approach to the problem. Its key novelty is that it only needs to train a meta-classifier, which can then continually accept new classes when they have enough labeled data for the meta-classifier to use, and also detect/reject future unseen classes. No re-training of the meta-classifier or a new overall classifier covering all old and new classes is needed. In testing, the method only uses the examples of the seen classes (including the newly added classes) on-the-fly for classification and rejection. Experimental results demonstrate the effectiveness of the new approach. |
Tasks | Meta-Learning |
Published | 2018-09-17 |
URL | http://arxiv.org/abs/1809.06004v2 |
http://arxiv.org/pdf/1809.06004v2.pdf | |
PWC | https://paperswithcode.com/paper/open-world-learning-and-application-to |
Repo | https://github.com/howardhsu/Meta-Open-World-Learning |
Framework | tf |
Understanding disentangling in $β$-VAE
Title | Understanding disentangling in $β$-VAE |
Authors | Christopher P. Burgess, Irina Higgins, Arka Pal, Loic Matthey, Nick Watters, Guillaume Desjardins, Alexander Lerchner |
Abstract | We present new intuitions and theoretical assessments of the emergence of disentangled representation in variational autoencoders. Taking a rate-distortion theory perspective, we show the circumstances under which representations aligned with the underlying generative factors of variation of data emerge when optimising the modified ELBO bound in $\beta$-VAE, as training progresses. From these insights, we propose a modification to the training regime of $\beta$-VAE, that progressively increases the information capacity of the latent code during training. This modification facilitates the robust learning of disentangled representations in $\beta$-VAE, without the previous trade-off in reconstruction accuracy. |
Tasks | |
Published | 2018-04-10 |
URL | http://arxiv.org/abs/1804.03599v1 |
http://arxiv.org/pdf/1804.03599v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-disentangling-in-vae |
Repo | https://github.com/CocoJam/Beta_VAE |
Framework | tf |
Unsupervised Cross-dataset Person Re-identification by Transfer Learning of Spatial-Temporal Patterns
Title | Unsupervised Cross-dataset Person Re-identification by Transfer Learning of Spatial-Temporal Patterns |
Authors | Jianming Lv, Weihang Chen, Qing Li, Can Yang |
Abstract | Most of the proposed person re-identification algorithms conduct supervised training and testing on single labeled datasets with small size, so directly deploying these trained models to a large-scale real-world camera network may lead to poor performance due to underfitting. It is challenging to incrementally optimize the models by using the abundant unlabeled data collected from the target domain. To address this challenge, we propose an unsupervised incremental learning algorithm, TFusion, which is aided by the transfer learning of the pedestrians’ spatio-temporal patterns in the target domain. Specifically, the algorithm firstly transfers the visual classifier trained from small labeled source dataset to the unlabeled target dataset so as to learn the pedestrians’ spatial-temporal patterns. Secondly, a Bayesian fusion model is proposed to combine the learned spatio-temporal patterns with visual features to achieve a significantly improved classifier. Finally, we propose a learning-to-rank based mutual promotion procedure to incrementally optimize the classifiers based on the unlabeled data in the target domain. Comprehensive experiments based on multiple real surveillance datasets are conducted, and the results show that our algorithm gains significant improvement compared with the state-of-art cross-dataset unsupervised person re-identification algorithms. |
Tasks | Learning-To-Rank, Person Re-Identification, Transfer Learning, Unsupervised Person Re-Identification |
Published | 2018-03-20 |
URL | http://arxiv.org/abs/1803.07293v1 |
http://arxiv.org/pdf/1803.07293v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-cross-dataset-person-re |
Repo | https://github.com/ahangchen/TFusion |
Framework | tf |