January 30, 2020

3250 words 16 mins read

Paper Group ANR 438

A Model for Learned Bloom Filters, and Optimizing by Sandwiching. 360-Degree Textures of People in Clothing from a Single Image. SPA-GAN: Spatial Attention GAN for Image-to-Image Translation. Improving Scientific Article Visibility by Neural Title Simplification. Towards Robust Curve Text Detection with Conditional Spatial Expansion. A Tour of Conv …

A Model for Learned Bloom Filters, and Optimizing by Sandwiching


Title	A Model for Learned Bloom Filters, and Optimizing by Sandwiching
Authors	Michael Mitzenmacher
Abstract	Recent work has suggested enhancing Bloom filters by using a pre-filter, based on applying machine learning to determine a function that models the data set the Bloom filter is meant to represent. Here we model such learned Bloom filters,, with the following outcomes: (1) we clarify what guarantees can and cannot be associated with such a structure; (2) we show how to estimate what size the learning function must obtain in order to obtain improved performance; (3) we provide a simple method, sandwiching, for optimizing learned Bloom filters; and (4) we propose a design and analysis approach for a learned Bloomier filter, based on our modeling approach.
Tasks
Published	2019-01-03
URL	http://arxiv.org/abs/1901.00902v1
PDF	http://arxiv.org/pdf/1901.00902v1.pdf
PWC	https://paperswithcode.com/paper/a-model-for-learned-bloom-filters-and
Repo
Framework

360-Degree Textures of People in Clothing from a Single Image


Title	360-Degree Textures of People in Clothing from a Single Image
Authors	Verica Lazova, Eldar Insafutdinov, Gerard Pons-Moll
Abstract	In this paper we predict a full 3D avatar of a person from a single image. We infer texture and geometry in the UV-space of the SMPL model using an image-to-image translation method. Given partial texture and segmentation layout maps derived from the input view, our model predicts the complete segmentation map, the complete texture map, and a displacement map. The predicted maps can be applied to the SMPL model in order to naturally generalize to novel poses, shapes, and even new clothing. In order to learn our model in a common UV-space, we non-rigidly register the SMPL model to thousands of 3D scans, effectively encoding textures and geometries as images in correspondence. This turns a difficult 3D inference task into a simpler image-to-image translation one. Results on rendered scans of people and images from the DeepFashion dataset demonstrate that our method can reconstruct plausible 3D avatars from a single image. We further use our model to digitally change pose, shape, swap garments between people and edit clothing. To encourage research in this direction we will make the source code available for research purpose.
Tasks	Image-to-Image Translation
Published	2019-08-20
URL	https://arxiv.org/abs/1908.07117v1
PDF	https://arxiv.org/pdf/1908.07117v1.pdf
PWC	https://paperswithcode.com/paper/360-degree-textures-of-people-in-clothing
Repo
Framework

SPA-GAN: Spatial Attention GAN for Image-to-Image Translation


Title	SPA-GAN: Spatial Attention GAN for Image-to-Image Translation
Authors	Hajar Emami, Majid Moradi Aliabadi, Ming Dong, Ratna Babu Chinnam
Abstract	Image-to-image translation is to learn a mapping between images from a source domain and images from a target domain. In this paper, we introduce the attention mechanism directly to the generative adversarial network (GAN) architecture and propose a novel spatial attention GAN model (SPA-GAN) for image-to-image translation tasks. SPA-GAN computes the attention in its discriminator and use it to help the generator focus more on the most discriminative regions between the source and target domains, leading to more realistic output images. We also find it helpful to introduce an additional feature map loss in SPA-GAN training to preserve domain specific features during translation. Compared with existing attention-guided GAN models, SPA-GAN is a lightweight model that does not need additional attention networks or supervision. Qualitative and quantitative comparison against state-of-the-art methods on benchmark datasets demonstrates the superior performance of SPA-GAN.
Tasks	Image-to-Image Translation
Published	2019-08-19
URL	https://arxiv.org/abs/1908.06616v1
PDF	https://arxiv.org/pdf/1908.06616v1.pdf
PWC	https://paperswithcode.com/paper/spa-gan-spatial-attention-gan-for-image-to
Repo
Framework

Improving Scientific Article Visibility by Neural Title Simplification


Title	Improving Scientific Article Visibility by Neural Title Simplification
Authors	Alexander Shvets
Abstract	The rapidly growing amount of data that scientific content providers should deliver to a user makes them create effective recommendation tools. A title of an article is often the only shown element to attract people’s attention. We offer an approach to automatic generating titles with various levels of informativeness to benefit from different categories of users. Statistics from ResearchGate used to bias train datasets and specially designed post-processing step applied to neural sequence-to-sequence models allow reaching the desired variety of simplified titles to gain a trade-off between the attractiveness and transparency of recommendation.
Tasks
Published	2019-04-05
URL	http://arxiv.org/abs/1904.03172v1
PDF	http://arxiv.org/pdf/1904.03172v1.pdf
PWC	https://paperswithcode.com/paper/improving-scientific-article-visibility-by
Repo
Framework

Towards Robust Curve Text Detection with Conditional Spatial Expansion


Title	Towards Robust Curve Text Detection with Conditional Spatial Expansion
Authors	Zichuan Liu, Guosheng Lin, Sheng Yang, Fayao Liu, Weisi Lin, Wang Ling Goh
Abstract	It is challenging to detect curve texts due to their irregular shapes and varying sizes. In this paper, we first investigate the deficiency of the existing curve detection methods and then propose a novel Conditional Spatial Expansion (CSE) mechanism to improve the performance of curve text detection. Instead of regarding the curve text detection as a polygon regression or a segmentation problem, we treat it as a region expansion process. Our CSE starts with a seed arbitrarily initialized within a text region and progressively merges neighborhood regions based on the extracted local features by a CNN and contextual information of merged regions. The CSE is highly parameterized and can be seamlessly integrated into existing object detection frameworks. Enhanced by the data-dependent CSE mechanism, our curve text detection system provides robust instance-level text region extraction with minimal post-processing. The analysis experiment shows that our CSE can handle texts with various shapes, sizes, and orientations, and can effectively suppress the false-positives coming from text-like textures or unexpected texts included in the same RoI. Compared with the existing curve text detection algorithms, our method is more robust and enjoys a simpler processing flow. It also creates a new state-of-art performance on curve text benchmarks with F-score of up to 78.4$%$.
Tasks	Object Detection
Published	2019-03-21
URL	http://arxiv.org/abs/1903.08836v1
PDF	http://arxiv.org/pdf/1903.08836v1.pdf
PWC	https://paperswithcode.com/paper/towards-robust-curve-text-detection-with
Repo
Framework

A Tour of Convolutional Networks Guided by Linear Interpreters


Title	A Tour of Convolutional Networks Guided by Linear Interpreters
Authors	Pablo Navarrete Michelini, Hanwen Liu, Yunhua Lu, Xingqun Jiang
Abstract	Convolutional networks are large linear systems divided into layers and connected by non-linear units. These units are the “articulations” that allow the network to adapt to the input. To understand how a network manages to solve a problem we must look at the articulated decisions in entirety. If we could capture the actions of non-linear units for a particular input, we would be able to replay the whole system back and forth as if it was always linear. It would also reveal the actions of non-linearities because the resulting linear system, a Linear Interpreter, depends on the input image. We introduce a hooking layer, called a LinearScope, which allows us to run the network and the linear interpreter in parallel. Its implementation is simple, flexible and efficient. From here we can make many curious inquiries: how do these linear systems look like? When the rows and columns of the transformation matrix are images, how do they look like? What type of basis do these linear transformations rely on? The answers depend on the problems presented, through which we take a tour to some popular architectures used for classification, super-resolution (SR) and image-to-image translation (I2I). For classification we observe that popular networks use a pixel-wise vote per class strategy and heavily rely on bias parameters. For SR and I2I we find that CNNs use wavelet-type basis similar to the human visual system. For I2I we reveal copy-move and template-creation strategies to generate outputs.
Tasks	Image-to-Image Translation, Super-Resolution
Published	2019-08-14
URL	https://arxiv.org/abs/1908.05168v1
PDF	https://arxiv.org/pdf/1908.05168v1.pdf
PWC	https://paperswithcode.com/paper/a-tour-of-convolutional-networks-guided-by
Repo
Framework

Surface Defect Classification in Real-Time Using Convolutional Neural Networks


Title	Surface Defect Classification in Real-Time Using Convolutional Neural Networks
Authors	Selim Arikan, Kiran Varanasi, Didier Stricker
Abstract	Surface inspection systems are an important application domain for computer vision, as they are used for defect detection and classification in the manufacturing industry. Existing systems use hand-crafted features which require extensive domain knowledge to create. Even though Convolutional neural networks (CNNs) have proven successful in many large-scale challenges, industrial inspection systems have yet barely realized their potential due to two significant challenges: real-time processing speed requirements and specialized narrow domain-specific datasets which are sometimes limited in size. In this paper, we propose CNN models that are specifically designed to handle capacity and real-time speed requirements of surface inspection systems. To train and evaluate our network models, we created a surface image dataset containing more than 22000 labeled images with many types of surface materials and achieved 98.0% accuracy in binary defect classification. To solve the class imbalance problem in our datasets, we introduce neural data augmentation methods which are also applicable to similar domains that suffer from the same problem. Our results show that deep learning based methods are feasible to be used in surface inspection systems and outperform traditional methods in accuracy and inference time by considerable margins.
Tasks	Data Augmentation
Published	2019-04-07
URL	http://arxiv.org/abs/1904.04671v1
PDF	http://arxiv.org/pdf/1904.04671v1.pdf
PWC	https://paperswithcode.com/paper/surface-defect-classification-in-real-time
Repo
Framework

Learning Object-specific Distance from a Monocular Image


Title	Learning Object-specific Distance from a Monocular Image
Authors	Jing Zhu, Yi Fang, Husam Abu-Haimed, Kuo-Chin Lien, Dongdong Fu, Junli Gu
Abstract	Environment perception, including object detection and distance estimation, is one of the most crucial tasks for autonomous driving. Many attentions have been paid on the object detection task, but distance estimation only arouse few interests in the computer vision community. Observing that the traditional inverse perspective mapping algorithm performs poorly for objects far away from the camera or on the curved road, in this paper, we address the challenging distance estimation problem by developing the first end-to-end learning-based model to directly predict distances for given objects in the images. Besides the introduction of a learning-based base model, we further design an enhanced model with a keypoint regressor, where a projection loss is defined to enforce a better distance estimation, especially for objects close to the camera. To facilitate the research on this task, we construct the extented KITTI and nuScenes (mini) object detection datasets with a distance for each object. Our experiments demonstrate that our proposed methods outperform alternative approaches (e.g., the traditional IPM, SVR) on object-specific distance estimation, particularly for the challenging cases that objects are on a curved road. Moreover, the performance margin implies the effectiveness of our enhanced method.
Tasks	Autonomous Driving, Object Detection
Published	2019-09-09
URL	https://arxiv.org/abs/1909.04182v1
PDF	https://arxiv.org/pdf/1909.04182v1.pdf
PWC	https://paperswithcode.com/paper/learning-object-specific-distance-from-a
Repo
Framework


Title	Single-modal and Multi-modal False Arrhythmia Alarm Reduction using Attention-based Convolutional and Recurrent Neural Networks
Authors	Sajad Mousavi, Atiyeh Fotoohinasab, Fatemeh Afghah
Abstract	This study proposes a deep learning model that effectively suppresses the false alarms in the intensive care units (ICUs) without ignoring the true alarms using single- and multimodal biosignals. Most of the current work in the literature are either rule-based methods, requiring prior knowledge of arrhythmia analysis to build rules, or classical machine learning approaches, depending on hand-engineered features. In this work, we apply convolutional neural networks to automatically extract time-invariant features, an attention mechanism to put more emphasis on the important regions of the input segmented signal(s) that are more likely to contribute to an alarm, and long short-term memory units to capture the temporal information presented in the signal segments. We trained our method efficiently using a two-step training algorithm (i.e., pre-training and fine-tuning the proposed network) on the dataset provided by the PhysioNet computing in cardiology challenge 2015. The evaluation results demonstrate that the proposed method obtains better results compared to other existing algorithms for the false alarm reduction task in ICUs. The proposed method achieves a sensitivity of 93.88% and a specificity of 92.05% for the alarm classification, considering three different signals. In addition, our experiments for 5 separate alarm types leads significant results, where we just consider a single-lead ECG (e.g., a sensitivity of 90.71%, a specificity of 88.30%, an AUC of 89.51 for alarm type of Ventricular Tachycardia arrhythmia)
Tasks
Published	2019-09-25
URL	https://arxiv.org/abs/1909.11791v1
PDF	https://arxiv.org/pdf/1909.11791v1.pdf
PWC	https://paperswithcode.com/paper/single-modal-and-multi-modal-false-arrhythmia
Repo
Framework

Predicting Consumer Default: A Deep Learning Approach


Title	Predicting Consumer Default: A Deep Learning Approach
Authors	Stefania Albanesi, Domonkos F. Vamossy
Abstract	We develop a model to predict consumer default based on deep learning. We show that the model consistently outperforms standard credit scoring models, even though it uses the same data. Our model is interpretable and is able to provide a score to a larger class of borrowers relative to standard credit scoring models while accurately tracking variations in systemic risk. We argue that these properties can provide valuable insights for the design of policies targeted at reducing consumer default and alleviating its burden on borrowers and lenders, as well as macroprudential regulation.
Tasks
Published	2019-08-30
URL	https://arxiv.org/abs/1908.11498v2
PDF	https://arxiv.org/pdf/1908.11498v2.pdf
PWC	https://paperswithcode.com/paper/predicting-consumer-default-a-deep-learning
Repo
Framework

Vector representations of text data in deep learning


Title	Vector representations of text data in deep learning
Authors	Karol Grzegorczyk
Abstract	In this dissertation we report results of our research on dense distributed representations of text data. We propose two novel neural models for learning such representations. The first model learns representations at the document level, while the second model learns word-level representations. For document-level representations we propose Binary Paragraph Vector: a neural network models for learning binary representations of text documents, which can be used for fast document retrieval. We provide a thorough evaluation of these models and demonstrate that they outperform the seminal method in the field in the information retrieval task. We also report strong results in transfer learning settings, where our models are trained on a generic text corpus and then used to infer codes for documents from a domain-specific dataset. In contrast to previously proposed approaches, Binary Paragraph Vector models learn embeddings directly from raw text data. For word-level representations we propose Disambiguated Skip-gram: a neural network model for learning multi-sense word embeddings. Representations learned by this model can be used in downstream tasks, like part-of-speech tagging or identification of semantic relations. In the word sense induction task Disambiguated Skip-gram outperforms state-of-the-art models on three out of four benchmarks datasets. Our model has an elegant probabilistic interpretation. Furthermore, unlike previous models of this kind, it is differentiable with respect to all its parameters and can be trained with backpropagation. In addition to quantitative results, we present qualitative evaluation of Disambiguated Skip-gram, including two-dimensional visualisations of selected word-sense embeddings.
Tasks	Information Retrieval, Part-Of-Speech Tagging, Transfer Learning, Word Embeddings, Word Sense Induction
Published	2019-01-07
URL	http://arxiv.org/abs/1901.01695v1
PDF	http://arxiv.org/pdf/1901.01695v1.pdf
PWC	https://paperswithcode.com/paper/vector-representations-of-text-data-in-deep
Repo
Framework

A PCA-like Autoencoder


Title	A PCA-like Autoencoder
Authors	Saïd Ladjal, Alasdair Newson, Chi-Hieu Pham
Abstract	An autoencoder is a neural network which data projects to and from a lower dimensional latent space, where this data is easier to understand and model. The autoencoder consists of two sub-networks, the encoder and the decoder, which carry out these transformations. The neural network is trained such that the output is as close to the input as possible, the data having gone through an information bottleneck : the latent space. This tool bears significant ressemblance to Principal Component Analysis (PCA), with two main differences. Firstly, the autoencoder is a non-linear transformation, contrary to PCA, which makes the autoencoder more flexible and powerful. Secondly, the axes found by a PCA are orthogonal, and are ordered in terms of the amount of variability which the data presents along these axes. This makes the interpretability of the PCA much greater than that of the autoencoder, which does not have these attributes. Ideally, then, we would like an autoencoder whose latent space consists of independent components, ordered by decreasing importance to the data. In this paper, we propose an algorithm to create such a network. We create an iterative algorithm which progressively increases the size of the latent space, learning a new dimension at each step. Secondly, we propose a covariance loss term to add to the standard autoencoder loss function, as well as a normalisation layer just before the latent space, which encourages the latent space components to be statistically independent. We demonstrate the results of this autoencoder on simple geometric shapes, and find that the algorithm indeed finds a meaningful representation in the latent space. This means that subsequent interpolation in the latent space has meaning with respect to the geometric properties of the images.
Tasks
Published	2019-04-02
URL	http://arxiv.org/abs/1904.01277v1
PDF	http://arxiv.org/pdf/1904.01277v1.pdf
PWC	https://paperswithcode.com/paper/a-pca-like-autoencoder
Repo
Framework

Image-Based Place Recognition on Bucolic Environment Across Seasons From Semantic Edge Description


Title	Image-Based Place Recognition on Bucolic Environment Across Seasons From Semantic Edge Description
Authors	Assia Benbihi, Stéphanie Arravechia, Matthieu Geist, Cédric Pradalier
Abstract	Most of the research effort on image-based place recognition is designed for urban environments. In bucolic environments such as natural scenes with low texture and little semantic content, the main challenge is to handle the variations in visual appearance across time such as illumination, weather, vegetation state or viewpoints. The nature of the variations is different and this leads to a different approach to describing a bucolic scene. We introduce a global image descriptor computed from its semantic and topological information. It is built from the wavelet transforms of the image semantic edges. Matching two images is then equivalent to matching their semantic edge descriptors. We show that this method reaches state-of-the-art image retrieval performance on two multi-season environment-monitoring datasets: the CMU-Seasons and the Symphony Lake dataset. It also generalises to urban scenes on which it is on par with the current baselines NetVLAD and DELF.
Tasks	Image Retrieval
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12468v3
PDF	https://arxiv.org/pdf/1910.12468v3.pdf
PWC	https://paperswithcode.com/paper/image-based-place-recognition-on-bucolic
Repo
Framework


Title	Analyzing User Activities Using Vector Space Model in Online Social Networks
Authors	Dhrubasish Sarkar, Premananda Jana
Abstract	The increasing popularity of internet, wireless technologies and mobile devices has led to the birth of mass connectivity and online interaction through Online Social Networks (OSNs) and similar environments. OSN reflects a social structure consist of a set of individuals and different types of ties like connections, relationships, interactions etc among them and helps its users to connect with their friends and common interest groups, share views and to pass information. Now days the users choose OSN sites as a most preferred place for sharing their updates, different views, posting photographs and would like to make it available for others for viewing, rating and making comments. The current paper aims to explore and analyze the association between the objects (like photographs, posts etc) and its viewers (friends, acquaintances etc) for a given user and to find activity relationship among them by using the TF-IDF scheme of Vector Space Model. After vectorization the vector data has been presented through a weighted graph with various properties.
Tasks
Published	2019-10-13
URL	https://arxiv.org/abs/1910.05691v1
PDF	https://arxiv.org/pdf/1910.05691v1.pdf
PWC	https://paperswithcode.com/paper/analyzing-user-activities-using-vector-space
Repo
Framework

Simple and optimal high-probability bounds for strongly-convex stochastic gradient descent


Title	Simple and optimal high-probability bounds for strongly-convex stochastic gradient descent
Authors	Nicholas J. A. Harvey, Christopher Liaw, Sikander Randhawa
Abstract	We consider stochastic gradient descent algorithms for minimizing a non-smooth, strongly-convex function. Several forms of this algorithm, including suffix averaging, are known to achieve the optimal $O(1/T)$ convergence rate in expectation. We consider a simple, non-uniform averaging strategy of Lacoste-Julien et al. (2011) and prove that it achieves the optimal $O(1/T)$ convergence rate with high probability. Our proof uses a recently developed generalization of Freedman’s inequality. Finally, we compare several of these algorithms experimentally and show that this non-uniform averaging strategy outperforms many standard techniques, and with smaller variance.
Tasks
Published	2019-09-02
URL	https://arxiv.org/abs/1909.00843v1
PDF	https://arxiv.org/pdf/1909.00843v1.pdf
PWC	https://paperswithcode.com/paper/simple-and-optimal-high-probability-bounds
Repo
Framework