February 2, 2020

2887 words 14 mins read

Paper Group AWR 50

Paper Group AWR 50

Creativity Inspired Zero-Shot Learning. Towards Finding Longer Proofs. Learning to Cluster Faces on an Affinity Graph. DensePoint: Learning Densely Contextual Representation for Efficient Point Cloud Processing. Guiding Theorem Proving by Recurrent Neural Networks. UWGAN: Underwater GAN for Real-world Underwater Color Restoration and Dehazing. Vide …

Creativity Inspired Zero-Shot Learning

Title Creativity Inspired Zero-Shot Learning
Authors Mohamed Elhoseiny, Mohamed Elfeki
Abstract Zero-shot learning (ZSL) aims at understanding unseen categories with no training examples from class-level descriptions. To improve the discriminative power of zero-shot learning, we model the visual learning process of unseen categories with inspiration from the psychology of human creativity for producing novel art. We relate ZSL to human creativity by observing that zero-shot learning is about recognizing the unseen and creativity is about creating a likable unseen. We introduce a learning signal inspired by creativity literature that explores the unseen space with hallucinated class-descriptions and encourages careful deviation of their visual feature generations from seen classes while allowing knowledge transfer from seen to unseen classes. Empirically, we show consistent improvement over the state of the art of several percents on the largest available benchmarks on the challenging task or generalized ZSL from a noisy text that we focus on, using the CUB and NABirds datasets. We also show the advantage of our approach on Attribute-based ZSL on three additional datasets (AwA2, aPY, and SUN). Code is available.
Tasks Transfer Learning, Zero-Shot Learning
Published 2019-04-01
URL https://arxiv.org/abs/1904.01109v6
PDF https://arxiv.org/pdf/1904.01109v6.pdf
PWC https://paperswithcode.com/paper/creativity-inspired-zero-shot-learning
Repo https://github.com/mhelhoseiny/CIZSL
Framework pytorch

Towards Finding Longer Proofs

Title Towards Finding Longer Proofs
Authors Zsolt Zombori, Adrián Csiszárik, Henryk Michalewski, Cezary Kaliszyk, Josef Urban
Abstract We present a reinforcement learning (RL) based guidance system for automated theorem proving geared towards Finding Longer Proofs (FLoP). FLoP focuses on generalizing from short proofs to longer ones of similar structure. To achieve that, FLoP uses state-of-the-art RL approaches that were previously not applied in theorem proving. In particular, we show that curriculum learning significantly outperforms previous learning-based proof guidance on a synthetic dataset of increasingly difficult arithmetic problems.
Tasks Automated Theorem Proving
Published 2019-05-30
URL https://arxiv.org/abs/1905.13100v1
PDF https://arxiv.org/pdf/1905.13100v1.pdf
PWC https://paperswithcode.com/paper/towards-finding-longer-proofs
Repo https://github.com/atpcurr/atpcurr
Framework none

Learning to Cluster Faces on an Affinity Graph

Title Learning to Cluster Faces on an Affinity Graph
Authors Lei Yang, Xiaohang Zhan, Dapeng Chen, Junjie Yan, Chen Change Loy, Dahua Lin
Abstract Face recognition sees remarkable progress in recent years, and its performance has reached a very high level. Taking it to a next level requires substantially larger data, which would involve prohibitive annotation cost. Hence, exploiting unlabeled data becomes an appealing alternative. Recent works have shown that clustering unlabeled faces is a promising approach, often leading to notable performance gains. Yet, how to effectively cluster, especially on a large-scale (i.e. million-level or above) dataset, remains an open question. A key challenge lies in the complex variations of cluster patterns, which make it difficult for conventional clustering methods to meet the needed accuracy. This work explores a novel approach, namely, learning to cluster instead of relying on hand-crafted criteria. Specifically, we propose a framework based on graph convolutional network, which combines a detection and a segmentation module to pinpoint face clusters. Experiments show that our method yields significantly more accurate face clusters, which, as a result, also lead to further performance gain in face recognition.
Tasks Face Recognition
Published 2019-04-04
URL https://arxiv.org/abs/1904.02749v2
PDF https://arxiv.org/pdf/1904.02749v2.pdf
PWC https://paperswithcode.com/paper/learning-to-cluster-faces-on-an-affinity
Repo https://github.com/yl-1993/learn-to-cluster
Framework pytorch

DensePoint: Learning Densely Contextual Representation for Efficient Point Cloud Processing

Title DensePoint: Learning Densely Contextual Representation for Efficient Point Cloud Processing
Authors Yongcheng Liu, Bin Fan, Gaofeng Meng, Jiwen Lu, Shiming Xiang, Chunhong Pan
Abstract Point cloud processing is very challenging, as the diverse shapes formed by irregular points are often indistinguishable. A thorough grasp of the elusive shape requires sufficiently contextual semantic information, yet few works devote to this. Here we propose DensePoint, a general architecture to learn densely contextual representation for point cloud processing. Technically, it extends regular grid CNN to irregular point configuration by generalizing a convolution operator, which holds the permutation invariance of points, and achieves efficient inductive learning of local patterns. Architecturally, it finds inspiration from dense connection mode, to repeatedly aggregate multi-level and multi-scale semantics in a deep hierarchy. As a result, densely contextual information along with rich semantics, can be acquired by DensePoint in an organic manner, making it highly effective. Extensive experiments on challenging benchmarks across four tasks, as well as thorough model analysis, verify DensePoint achieves the state of the arts.
Tasks
Published 2019-09-09
URL https://arxiv.org/abs/1909.03669v1
PDF https://arxiv.org/pdf/1909.03669v1.pdf
PWC https://paperswithcode.com/paper/densepoint-learning-densely-contextual
Repo https://github.com/Yochengliu/DensePoint
Framework pytorch

Guiding Theorem Proving by Recurrent Neural Networks

Title Guiding Theorem Proving by Recurrent Neural Networks
Authors Bartosz Piotrowski, Josef Urban
Abstract We describe two theorem proving tasks – premise selection and internal guidance – for which machine learning has been recently used with some success. We argue that the existing methods however do not correspond to the way how humans approach these tasks. In particular, the existing methods so far lack the notion of a state that is updated each time a choice in the reasoning process is made. To address that, we propose an analogy with tasks such as machine translation, where stateful architectures such as recurrent neural networks have been recently very successful. Then we develop and publish a series of sequence-to-sequence data sets that correspond to the theorem proving tasks using several encodings, and provide the first experimental evaluation of the performance of recurrent neural networks on such tasks.
Tasks Automated Theorem Proving, Machine Translation
Published 2019-05-20
URL https://arxiv.org/abs/1905.07961v1
PDF https://arxiv.org/pdf/1905.07961v1.pdf
PWC https://paperswithcode.com/paper/guiding-theorem-proving-by-recurrent-neural
Repo https://github.com/BartoszPiotrowski/rnn-for-proving-data
Framework none

UWGAN: Underwater GAN for Real-world Underwater Color Restoration and Dehazing

Title UWGAN: Underwater GAN for Real-world Underwater Color Restoration and Dehazing
Authors Nan Wang, Yabin Zhou, Fenglei Han, Haitao Zhu, Yaojing Zheng
Abstract In real-world underwater environment, exploration of seabed resources, underwater archaeology, and underwater fishing rely on a variety of sensors, vision sensor is the most important one due to its high information content, non-intrusive, and passive nature. However, wavelength-dependent light attenuation and back-scattering result in color distortion and haze effect, which degrade the visibility of images. To address this problem, firstly, we proposed an unsupervised generative adversarial network (GAN) for generating realistic underwater images (color distortion and haze effect) from in-air image and depth map pairs based on improved underwater imaging model. Secondly, U-Net, which is trained efficiently using synthetic underwater dataset, is adopted for color restoration and dehazing. Our model directly reconstructs underwater clear images using end-to-end autoencoder networks, while maintaining scene content structural similarity. The results obtained by our method were compared with existing methods qualitatively and quantitatively. Experimental results obtained by the proposed model demonstrate well performance on open real-world underwater datasets, and the processing speed can reach up to 125FPS running on one NVIDIA 1060 GPU. Source code, sample datasets are made publicly available at https://github.com/infrontofme/UWGAN_UIE.
Tasks
Published 2019-12-21
URL https://arxiv.org/abs/1912.10269v1
PDF https://arxiv.org/pdf/1912.10269v1.pdf
PWC https://paperswithcode.com/paper/uwgan-underwater-gan-for-real-world-1
Repo https://github.com/infrontofme/UWGAN_UIE
Framework tf

Video Face Clustering with Unknown Number of Clusters

Title Video Face Clustering with Unknown Number of Clusters
Authors Makarand Tapaswi, Marc T. Law, Sanja Fidler
Abstract Understanding videos such as TV series and movies requires analyzing who the characters are and what they are doing. We address the challenging problem of clustering face tracks based on their identity. Different from previous work in this area, we choose to operate in a realistic and difficult setting where: (i) the number of characters is not known a priori; and (ii) face tracks belonging to minor or background characters are not discarded. To this end, we propose Ball Cluster Learning (BCL), a supervised approach to carve the embedding space into balls of equal size, one for each cluster. The learned ball radius is easily translated to a stopping criterion for iterative merging algorithms. This gives BCL the ability to estimate the number of clusters as well as their assignment, achieving promising results on commonly used datasets. We also present a thorough discussion of how existing metric learning literature can be adapted for this task.
Tasks Metric Learning
Published 2019-08-09
URL https://arxiv.org/abs/1908.03381v2
PDF https://arxiv.org/pdf/1908.03381v2.pdf
PWC https://paperswithcode.com/paper/video-face-clustering-with-unknown-number-of
Repo https://github.com/makarandtapaswi/BallClustering_ICCV2019
Framework pytorch

Metric Learning With HORDE: High-Order Regularizer for Deep Embeddings

Title Metric Learning With HORDE: High-Order Regularizer for Deep Embeddings
Authors Pierre Jacob, David Picard, Aymeric Histace, Edouard Klein
Abstract Learning an effective similarity measure between image representations is key to the success of recent advances in visual search tasks (e.g. verification or zero-shot learning). Although the metric learning part is well addressed, this metric is usually computed over the average of the extracted deep features. This representation is then trained to be discriminative. However, these deep features tend to be scattered across the feature space. Consequently, the representations are not robust to outliers, object occlusions, background variations, etc. In this paper, we tackle this scattering problem with a distribution-aware regularization named HORDE. This regularizer enforces visually-close images to have deep features with the same distribution which are well localized in the feature space. We provide a theoretical analysis supporting this regularization effect. We also show the effectiveness of our approach by obtaining state-of-the-art results on 4 well-known datasets (Cub-200-2011, Cars-196, Stanford Online Products and Inshop Clothes Retrieval).
Tasks Image Retrieval, Metric Learning
Published 2019-08-07
URL https://arxiv.org/abs/1908.02735v1
PDF https://arxiv.org/pdf/1908.02735v1.pdf
PWC https://paperswithcode.com/paper/metric-learning-with-horde-high-order
Repo https://github.com/pierre-jacob/ICCV2019-Horde
Framework tf

A Crowd-based Evaluation of Abuse Response Strategies in Conversational Agents

Title A Crowd-based Evaluation of Abuse Response Strategies in Conversational Agents
Authors Amanda Cercas Curry, Verena Rieser
Abstract How should conversational agents respond to verbal abuse through the user? To answer this question, we conduct a large-scale crowd-sourced evaluation of abuse response strategies employed by current state-of-the-art systems. Our results show that some strategies, such as “polite refusal” score highly across the board, while for other strategies demographic factors, such as age, as well as the severity of the preceding abuse influence the user’s perception of which response is appropriate. In addition, we find that most data-driven models lag behind rule-based or commercial systems in terms of their perceived appropriateness.
Tasks
Published 2019-09-10
URL https://arxiv.org/abs/1909.04387v1
PDF https://arxiv.org/pdf/1909.04387v1.pdf
PWC https://paperswithcode.com/paper/a-crowd-based-evaluation-of-abuse-response
Repo https://github.com/amandacurry/metoo_corpus
Framework none

A Wrapped Normal Distribution on Hyperbolic Space for Gradient-Based Learning

Title A Wrapped Normal Distribution on Hyperbolic Space for Gradient-Based Learning
Authors Yoshihiro Nagano, Shoichiro Yamaguchi, Yasuhiro Fujita, Masanori Koyama
Abstract Hyperbolic space is a geometry that is known to be well-suited for representation learning of data with an underlying hierarchical structure. In this paper, we present a novel hyperbolic distribution called \textit{pseudo-hyperbolic Gaussian}, a Gaussian-like distribution on hyperbolic space whose density can be evaluated analytically and differentiated with respect to the parameters. Our distribution enables the gradient-based learning of the probabilistic models on hyperbolic space that could never have been considered before. Also, we can sample from this hyperbolic probability distribution without resorting to auxiliary means like rejection sampling. As applications of our distribution, we develop a hyperbolic-analog of variational autoencoder and a method of probabilistic word embedding on hyperbolic space. We demonstrate the efficacy of our distribution on various datasets including MNIST, Atari 2600 Breakout, and WordNet.
Tasks Representation Learning
Published 2019-02-08
URL https://arxiv.org/abs/1902.02992v2
PDF https://arxiv.org/pdf/1902.02992v2.pdf
PWC https://paperswithcode.com/paper/a-differentiable-gaussian-like-distribution
Repo https://github.com/muupan/resume
Framework none

RGB-Infrared Cross-Modality Person Re-Identification via Joint Pixel and Feature Alignment

Title RGB-Infrared Cross-Modality Person Re-Identification via Joint Pixel and Feature Alignment
Authors Guan’an Wang, Tianzhu Zhang, Jian Cheng, Si Liu, Yang Yang, Zengguang Hou
Abstract RGB-Infrared (IR) person re-identification is an important and challenging task due to large cross-modality variations between RGB and IR images. Most conventional approaches aim to bridge the cross-modality gap with feature alignment by feature representation learning. Different from existing methods, in this paper, we propose a novel and end-to-end Alignment Generative Adversarial Network (AlignGAN) for the RGB-IR RE-ID task. The proposed model enjoys several merits. First, it can exploit pixel alignment and feature alignment jointly. To the best of our knowledge, this is the first work to model the two alignment strategies jointly for the RGB-IR RE-ID problem. Second, the proposed model consists of a pixel generator, a feature generator, and a joint discriminator. By playing a min-max game among the three components, our model is able to not only alleviate the cross-modality and intra-modality variations but also learn identity-consistent features. Extensive experimental results on two standard benchmarks demonstrate that the proposed model performs favorably against state-of-the-art methods. Especially, on SYSU-MM01 dataset, our model can achieve an absolute gain of 15.4% and 12.9% in terms of Rank-1 and mAP.
Tasks Person Re-Identification, Representation Learning
Published 2019-10-13
URL https://arxiv.org/abs/1910.05839v2
PDF https://arxiv.org/pdf/1910.05839v2.pdf
PWC https://paperswithcode.com/paper/rgb-infrared-cross-modality-person-re-1
Repo https://github.com/wangguanan/AlignGAN
Framework pytorch

Local Relation Networks for Image Recognition

Title Local Relation Networks for Image Recognition
Authors Han Hu, Zheng Zhang, Zhenda Xie, Stephen Lin
Abstract The convolution layer has been the dominant feature extractor in computer vision for years. However, the spatial aggregation in convolution is basically a pattern matching process that applies fixed filters which are inefficient at modeling visual elements with varying spatial distributions. This paper presents a new image feature extractor, called the local relation layer, that adaptively determines aggregation weights based on the compositional relationship of local pixel pairs. With this relational approach, it can composite visual elements into higher-level entities in a more efficient manner that benefits semantic inference. A network built with local relation layers, called the Local Relation Network (LR-Net), is found to provide greater modeling capacity than its counterpart built with regular convolution on large-scale recognition tasks such as ImageNet classification.
Tasks
Published 2019-04-25
URL http://arxiv.org/abs/1904.11491v1
PDF http://arxiv.org/pdf/1904.11491v1.pdf
PWC https://paperswithcode.com/paper/190411491
Repo https://github.com/gan3sh500/local-relational-nets
Framework pytorch

PerspectroScope: A Window to the World of Diverse Perspectives

Title PerspectroScope: A Window to the World of Diverse Perspectives
Authors Sihao Chen, Daniel Khashabi, Chris Callison-Burch, Dan Roth
Abstract This work presents PerspectroScope, a web-based system which lets users query a discussion-worthy natural language claim, and extract and visualize various perspectives in support or against the claim, along with evidence supporting each perspective. The system thus lets users explore various perspectives that could touch upon aspects of the issue at hand.The system is built as a combination of retrieval engines and learned textual-entailment-like classifiers built using a few recent developments in natural language understanding. To make the system more adaptive, expand its coverage, and improve its decisions over time, our platform employs various mechanisms to get corrections from the users. PerspectroScope is available at github.com/CogComp/perspectroscope.
Tasks Natural Language Inference
Published 2019-06-11
URL https://arxiv.org/abs/1906.04761v1
PDF https://arxiv.org/pdf/1906.04761v1.pdf
PWC https://paperswithcode.com/paper/perspectroscope-a-window-to-the-world-of
Repo https://github.com/CogComp/perspectroscope
Framework none

Homogeneous Vector Capsules Enable Adaptive Gradient Descent in Convolutional Neural Networks

Title Homogeneous Vector Capsules Enable Adaptive Gradient Descent in Convolutional Neural Networks
Authors Adam Byerly, Tatiana Kalganova
Abstract Capsules are the name given by Geoffrey Hinton to vector-valued neurons. Neural networks traditionally produce a scalar value for an activated neuron. Capsules, on the other hand, produce a vector of values, which Hinton argues correspond to a single, composite feature wherein the values of the components of the vectors indicate properties of the feature such as transformation or contrast. We present a new way of parameterizing and training capsules that we refer to as homogeneous vector capsules (HVCs). We demonstrate, experimentally, that altering a convolutional neural network (CNN) to use HVCs can achieve superior classification accuracy without increasing the number of parameters or operations in its architecture as compared to a CNN using a single final fully connected layer. Additionally, the introduction of HVCs enables the use of adaptive gradient descent, reducing the dependence a model’s achievable accuracy has on the finely tuned hyperparameters of a non-adaptive optimizer. We demonstrate our method and results using two neural network architectures. First, a very simple monolithic CNN that when using HVCs achieved a 63% improvement in top-1 classification accuracy and a 35% improvement in top-5 classification accuracy over the baseline architecture. Second, with the CNN architecture referred to as Inception v3 that achieved similar accuracies both with and without HVCs. Additionally, the simple monolithic CNN when using HVCs showed no overfitting after more than 300 epochs whereas the baseline showed overfitting after 30 epochs. We use the ImageNet ILSVRC 2012 classification challenge dataset with both networks.
Tasks
Published 2019-06-20
URL https://arxiv.org/abs/1906.08676v1
PDF https://arxiv.org/pdf/1906.08676v1.pdf
PWC https://paperswithcode.com/paper/homogeneous-vector-capsules-enable-adaptive
Repo https://github.com/AdamByerly/HVCsEnableAGD
Framework tf

LakhNES: Improving multi-instrumental music generation with cross-domain pre-training

Title LakhNES: Improving multi-instrumental music generation with cross-domain pre-training
Authors Chris Donahue, Huanru Henry Mao, Yiting Ethan Li, Garrison W. Cottrell, Julian McAuley
Abstract We are interested in the task of generating multi-instrumental music scores. The Transformer architecture has recently shown great promise for the task of piano score generation; here we adapt it to the multi-instrumental setting. Transformers are complex, high-dimensional language models which are capable of capturing long-term structure in sequence data, but require large amounts of data to fit. Their success on piano score generation is partially explained by the large volumes of symbolic data readily available for that domain. We leverage the recently-introduced NES-MDB dataset of four-instrument scores from an early video game sound synthesis chip (the NES), which we find to be well-suited to training with the Transformer architecture. To further improve the performance of our model, we propose a pre-training technique to leverage the information in a large collection of heterogeneous music, namely the Lakh MIDI dataset. Despite differences between the two corpora, we find that this transfer learning procedure improves both quantitative and qualitative performance for our primary task.
Tasks Music Generation, Transfer Learning
Published 2019-07-10
URL https://arxiv.org/abs/1907.04868v1
PDF https://arxiv.org/pdf/1907.04868v1.pdf
PWC https://paperswithcode.com/paper/lakhnes-improving-multi-instrumental-music
Repo https://github.com/chrisdonahue/LakhNES
Framework pytorch
comments powered by Disqus