February 2, 2020

2887 words 14 mins read

Paper Group AWR 50

Creativity Inspired Zero-Shot Learning. Towards Finding Longer Proofs. Learning to Cluster Faces on an Affinity Graph. DensePoint: Learning Densely Contextual Representation for Efficient Point Cloud Processing. Guiding Theorem Proving by Recurrent Neural Networks. UWGAN: Underwater GAN for Real-world Underwater Color Restoration and Dehazing. Vide …

Creativity Inspired Zero-Shot Learning


Title	Creativity Inspired Zero-Shot Learning
Authors	Mohamed Elhoseiny, Mohamed Elfeki
Abstract	Zero-shot learning (ZSL) aims at understanding unseen categories with no training examples from class-level descriptions. To improve the discriminative power of zero-shot learning, we model the visual learning process of unseen categories with inspiration from the psychology of human creativity for producing novel art. We relate ZSL to human creativity by observing that zero-shot learning is about recognizing the unseen and creativity is about creating a likable unseen. We introduce a learning signal inspired by creativity literature that explores the unseen space with hallucinated class-descriptions and encourages careful deviation of their visual feature generations from seen classes while allowing knowledge transfer from seen to unseen classes. Empirically, we show consistent improvement over the state of the art of several percents on the largest available benchmarks on the challenging task or generalized ZSL from a noisy text that we focus on, using the CUB and NABirds datasets. We also show the advantage of our approach on Attribute-based ZSL on three additional datasets (AwA2, aPY, and SUN). Code is available.
Tasks	Transfer Learning, Zero-Shot Learning
Published	2019-04-01
URL	https://arxiv.org/abs/1904.01109v6
PDF	https://arxiv.org/pdf/1904.01109v6.pdf
PWC	https://paperswithcode.com/paper/creativity-inspired-zero-shot-learning
Repo	https://github.com/mhelhoseiny/CIZSL
Framework	pytorch

Towards Finding Longer Proofs


Title	Towards Finding Longer Proofs
Authors	Zsolt Zombori, Adrián Csiszárik, Henryk Michalewski, Cezary Kaliszyk, Josef Urban
Abstract	We present a reinforcement learning (RL) based guidance system for automated theorem proving geared towards Finding Longer Proofs (FLoP). FLoP focuses on generalizing from short proofs to longer ones of similar structure. To achieve that, FLoP uses state-of-the-art RL approaches that were previously not applied in theorem proving. In particular, we show that curriculum learning significantly outperforms previous learning-based proof guidance on a synthetic dataset of increasingly difficult arithmetic problems.
Tasks	Automated Theorem Proving
Published	2019-05-30
URL	https://arxiv.org/abs/1905.13100v1
PDF	https://arxiv.org/pdf/1905.13100v1.pdf
PWC	https://paperswithcode.com/paper/towards-finding-longer-proofs
Repo	https://github.com/atpcurr/atpcurr
Framework	none

Learning to Cluster Faces on an Affinity Graph


Title	Learning to Cluster Faces on an Affinity Graph
Authors	Lei Yang, Xiaohang Zhan, Dapeng Chen, Junjie Yan, Chen Change Loy, Dahua Lin
Abstract	Face recognition sees remarkable progress in recent years, and its performance has reached a very high level. Taking it to a next level requires substantially larger data, which would involve prohibitive annotation cost. Hence, exploiting unlabeled data becomes an appealing alternative. Recent works have shown that clustering unlabeled faces is a promising approach, often leading to notable performance gains. Yet, how to effectively cluster, especially on a large-scale (i.e. million-level or above) dataset, remains an open question. A key challenge lies in the complex variations of cluster patterns, which make it difficult for conventional clustering methods to meet the needed accuracy. This work explores a novel approach, namely, learning to cluster instead of relying on hand-crafted criteria. Specifically, we propose a framework based on graph convolutional network, which combines a detection and a segmentation module to pinpoint face clusters. Experiments show that our method yields significantly more accurate face clusters, which, as a result, also lead to further performance gain in face recognition.
Tasks	Face Recognition
Published	2019-04-04
URL	https://arxiv.org/abs/1904.02749v2
PDF	https://arxiv.org/pdf/1904.02749v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-cluster-faces-on-an-affinity
Repo	https://github.com/yl-1993/learn-to-cluster
Framework	pytorch

DensePoint: Learning Densely Contextual Representation for Efficient Point Cloud Processing


Title	DensePoint: Learning Densely Contextual Representation for Efficient Point Cloud Processing
Authors	Yongcheng Liu, Bin Fan, Gaofeng Meng, Jiwen Lu, Shiming Xiang, Chunhong Pan
Abstract	Point cloud processing is very challenging, as the diverse shapes formed by irregular points are often indistinguishable. A thorough grasp of the elusive shape requires sufficiently contextual semantic information, yet few works devote to this. Here we propose DensePoint, a general architecture to learn densely contextual representation for point cloud processing. Technically, it extends regular grid CNN to irregular point configuration by generalizing a convolution operator, which holds the permutation invariance of points, and achieves efficient inductive learning of local patterns. Architecturally, it finds inspiration from dense connection mode, to repeatedly aggregate multi-level and multi-scale semantics in a deep hierarchy. As a result, densely contextual information along with rich semantics, can be acquired by DensePoint in an organic manner, making it highly effective. Extensive experiments on challenging benchmarks across four tasks, as well as thorough model analysis, verify DensePoint achieves the state of the arts.
Tasks
Published	2019-09-09
URL	https://arxiv.org/abs/1909.03669v1
PDF	https://arxiv.org/pdf/1909.03669v1.pdf
PWC	https://paperswithcode.com/paper/densepoint-learning-densely-contextual
Repo	https://github.com/Yochengliu/DensePoint
Framework	pytorch

Guiding Theorem Proving by Recurrent Neural Networks


Title	Guiding Theorem Proving by Recurrent Neural Networks
Authors	Bartosz Piotrowski, Josef Urban
Abstract	We describe two theorem proving tasks – premise selection and internal guidance – for which machine learning has been recently used with some success. We argue that the existing methods however do not correspond to the way how humans approach these tasks. In particular, the existing methods so far lack the notion of a state that is updated each time a choice in the reasoning process is made. To address that, we propose an analogy with tasks such as machine translation, where stateful architectures such as recurrent neural networks have been recently very successful. Then we develop and publish a series of sequence-to-sequence data sets that correspond to the theorem proving tasks using several encodings, and provide the first experimental evaluation of the performance of recurrent neural networks on such tasks.
Tasks	Automated Theorem Proving, Machine Translation
Published	2019-05-20
URL	https://arxiv.org/abs/1905.07961v1
PDF	https://arxiv.org/pdf/1905.07961v1.pdf
PWC	https://paperswithcode.com/paper/guiding-theorem-proving-by-recurrent-neural
Repo	https://github.com/BartoszPiotrowski/rnn-for-proving-data
Framework	none

UWGAN: Underwater GAN for Real-world Underwater Color Restoration and Dehazing


Title	UWGAN: Underwater GAN for Real-world Underwater Color Restoration and Dehazing
Authors	Nan Wang, Yabin Zhou, Fenglei Han, Haitao Zhu, Yaojing Zheng
Abstract	In real-world underwater environment, exploration of seabed resources, underwater archaeology, and underwater fishing rely on a variety of sensors, vision sensor is the most important one due to its high information content, non-intrusive, and passive nature. However, wavelength-dependent light attenuation and back-scattering result in color distortion and haze effect, which degrade the visibility of images. To address this problem, firstly, we proposed an unsupervised generative adversarial network (GAN) for generating realistic underwater images (color distortion and haze effect) from in-air image and depth map pairs based on improved underwater imaging model. Secondly, U-Net, which is trained efficiently using synthetic underwater dataset, is adopted for color restoration and dehazing. Our model directly reconstructs underwater clear images using end-to-end autoencoder networks, while maintaining scene content structural similarity. The results obtained by our method were compared with existing methods qualitatively and quantitatively. Experimental results obtained by the proposed model demonstrate well performance on open real-world underwater datasets, and the processing speed can reach up to 125FPS running on one NVIDIA 1060 GPU. Source code, sample datasets are made publicly available at https://github.com/infrontofme/UWGAN_UIE.
Tasks
Published	2019-12-21
URL	https://arxiv.org/abs/1912.10269v1
PDF	https://arxiv.org/pdf/1912.10269v1.pdf
PWC	https://paperswithcode.com/paper/uwgan-underwater-gan-for-real-world-1
Repo	https://github.com/infrontofme/UWGAN_UIE
Framework	tf

Video Face Clustering with Unknown Number of Clusters


Title	Video Face Clustering with Unknown Number of Clusters
Authors	Makarand Tapaswi, Marc T. Law, Sanja Fidler
Abstract	Understanding videos such as TV series and movies requires analyzing who the characters are and what they are doing. We address the challenging problem of clustering face tracks based on their identity. Different from previous work in this area, we choose to operate in a realistic and difficult setting where: (i) the number of characters is not known a priori; and (ii) face tracks belonging to minor or background characters are not discarded. To this end, we propose Ball Cluster Learning (BCL), a supervised approach to carve the embedding space into balls of equal size, one for each cluster. The learned ball radius is easily translated to a stopping criterion for iterative merging algorithms. This gives BCL the ability to estimate the number of clusters as well as their assignment, achieving promising results on commonly used datasets. We also present a thorough discussion of how existing metric learning literature can be adapted for this task.
Tasks	Metric Learning
Published	2019-08-09
URL	https://arxiv.org/abs/1908.03381v2
PDF	https://arxiv.org/pdf/1908.03381v2.pdf
PWC	https://paperswithcode.com/paper/video-face-clustering-with-unknown-number-of
Repo	https://github.com/makarandtapaswi/BallClustering_ICCV2019
Framework	pytorch

Metric Learning With HORDE: High-Order Regularizer for Deep Embeddings


Title	Metric Learning With HORDE: High-Order Regularizer for Deep Embeddings
Authors	Pierre Jacob, David Picard, Aymeric Histace, Edouard Klein
Abstract	Learning an effective similarity measure between image representations is key to the success of recent advances in visual search tasks (e.g. verification or zero-shot learning). Although the metric learning part is well addressed, this metric is usually computed over the average of the extracted deep features. This representation is then trained to be discriminative. However, these deep features tend to be scattered across the feature space. Consequently, the representations are not robust to outliers, object occlusions, background variations, etc. In this paper, we tackle this scattering problem with a distribution-aware regularization named HORDE. This regularizer enforces visually-close images to have deep features with the same distribution which are well localized in the feature space. We provide a theoretical analysis supporting this regularization effect. We also show the effectiveness of our approach by obtaining state-of-the-art results on 4 well-known datasets (Cub-200-2011, Cars-196, Stanford Online Products and Inshop Clothes Retrieval).
Tasks	Image Retrieval, Metric Learning
Published	2019-08-07
URL	https://arxiv.org/abs/1908.02735v1
PDF	https://arxiv.org/pdf/1908.02735v1.pdf
PWC	https://paperswithcode.com/paper/metric-learning-with-horde-high-order
Repo	https://github.com/pierre-jacob/ICCV2019-Horde
Framework	tf

A Crowd-based Evaluation of Abuse Response Strategies in Conversational Agents


Title	A Crowd-based Evaluation of Abuse Response Strategies in Conversational Agents
Authors	Amanda Cercas Curry, Verena Rieser
Abstract	How should conversational agents respond to verbal abuse through the user? To answer this question, we conduct a large-scale crowd-sourced evaluation of abuse response strategies employed by current state-of-the-art systems. Our results show that some strategies, such as “polite refusal” score highly across the board, while for other strategies demographic factors, such as age, as well as the severity of the preceding abuse influence the user’s perception of which response is appropriate. In addition, we find that most data-driven models lag behind rule-based or commercial systems in terms of their perceived appropriateness.
Tasks
Published	2019-09-10
URL	https://arxiv.org/abs/1909.04387v1
PDF	https://arxiv.org/pdf/1909.04387v1.pdf
PWC	https://paperswithcode.com/paper/a-crowd-based-evaluation-of-abuse-response
Repo	https://github.com/amandacurry/metoo_corpus
Framework	none

A Wrapped Normal Distribution on Hyperbolic Space for Gradient-Based Learning


Title	A Wrapped Normal Distribution on Hyperbolic Space for Gradient-Based Learning
Authors	Yoshihiro Nagano, Shoichiro Yamaguchi, Yasuhiro Fujita, Masanori Koyama
Abstract	Hyperbolic space is a geometry that is known to be well-suited for representation learning of data with an underlying hierarchical structure. In this paper, we present a novel hyperbolic distribution called \textit{pseudo-hyperbolic Gaussian}, a Gaussian-like distribution on hyperbolic space whose density can be evaluated analytically and differentiated with respect to the parameters. Our distribution enables the gradient-based learning of the probabilistic models on hyperbolic space that could never have been considered before. Also, we can sample from this hyperbolic probability distribution without resorting to auxiliary means like rejection sampling. As applications of our distribution, we develop a hyperbolic-analog of variational autoencoder and a method of probabilistic word embedding on hyperbolic space. We demonstrate the efficacy of our distribution on various datasets including MNIST, Atari 2600 Breakout, and WordNet.
Tasks	Representation Learning
Published	2019-02-08
URL	https://arxiv.org/abs/1902.02992v2
PDF	https://arxiv.org/pdf/1902.02992v2.pdf
PWC	https://paperswithcode.com/paper/a-differentiable-gaussian-like-distribution
Repo	https://github.com/muupan/resume
Framework	none

RGB-Infrared Cross-Modality Person Re-Identification via Joint Pixel and Feature Alignment


Title	RGB-Infrared Cross-Modality Person Re-Identification via Joint Pixel and Feature Alignment
Authors	Guan’an Wang, Tianzhu Zhang, Jian Cheng, Si Liu, Yang Yang, Zengguang Hou
Abstract	RGB-Infrared (IR) person re-identification is an important and challenging task due to large cross-modality variations between RGB and IR images. Most conventional approaches aim to bridge the cross-modality gap with feature alignment by feature representation learning. Different from existing methods, in this paper, we propose a novel and end-to-end Alignment Generative Adversarial Network (AlignGAN) for the RGB-IR RE-ID task. The proposed model enjoys several merits. First, it can exploit pixel alignment and feature alignment jointly. To the best of our knowledge, this is the first work to model the two alignment strategies jointly for the RGB-IR RE-ID problem. Second, the proposed model consists of a pixel generator, a feature generator, and a joint discriminator. By playing a min-max game among the three components, our model is able to not only alleviate the cross-modality and intra-modality variations but also learn identity-consistent features. Extensive experimental results on two standard benchmarks demonstrate that the proposed model performs favorably against state-of-the-art methods. Especially, on SYSU-MM01 dataset, our model can achieve an absolute gain of 15.4% and 12.9% in terms of Rank-1 and mAP.
Tasks	Person Re-Identification, Representation Learning
Published	2019-10-13
URL	https://arxiv.org/abs/1910.05839v2
PDF	https://arxiv.org/pdf/1910.05839v2.pdf
PWC	https://paperswithcode.com/paper/rgb-infrared-cross-modality-person-re-1
Repo	https://github.com/wangguanan/AlignGAN
Framework	pytorch

Local Relation Networks for Image Recognition


Title	Local Relation Networks for Image Recognition
Authors	Han Hu, Zheng Zhang, Zhenda Xie, Stephen Lin
Abstract	The convolution layer has been the dominant feature extractor in computer vision for years. However, the spatial aggregation in convolution is basically a pattern matching process that applies fixed filters which are inefficient at modeling visual elements with varying spatial distributions. This paper presents a new image feature extractor, called the local relation layer, that adaptively determines aggregation weights based on the compositional relationship of local pixel pairs. With this relational approach, it can composite visual elements into higher-level entities in a more efficient manner that benefits semantic inference. A network built with local relation layers, called the Local Relation Network (LR-Net), is found to provide greater modeling capacity than its counterpart built with regular convolution on large-scale recognition tasks such as ImageNet classification.
Tasks
Published	2019-04-25
URL	http://arxiv.org/abs/1904.11491v1
PDF	http://arxiv.org/pdf/1904.11491v1.pdf
PWC	https://paperswithcode.com/paper/190411491
Repo	https://github.com/gan3sh500/local-relational-nets
Framework	pytorch

PerspectroScope: A Window to the World of Diverse Perspectives


Title	PerspectroScope: A Window to the World of Diverse Perspectives
Authors	Sihao Chen, Daniel Khashabi, Chris Callison-Burch, Dan Roth
Abstract	This work presents PerspectroScope, a web-based system which lets users query a discussion-worthy natural language claim, and extract and visualize various perspectives in support or against the claim, along with evidence supporting each perspective. The system thus lets users explore various perspectives that could touch upon aspects of the issue at hand.The system is built as a combination of retrieval engines and learned textual-entailment-like classifiers built using a few recent developments in natural language understanding. To make the system more adaptive, expand its coverage, and improve its decisions over time, our platform employs various mechanisms to get corrections from the users. PerspectroScope is available at github.com/CogComp/perspectroscope.
Tasks	Natural Language Inference
Published	2019-06-11
URL	https://arxiv.org/abs/1906.04761v1
PDF	https://arxiv.org/pdf/1906.04761v1.pdf
PWC	https://paperswithcode.com/paper/perspectroscope-a-window-to-the-world-of
Repo	https://github.com/CogComp/perspectroscope
Framework	none

Homogeneous Vector Capsules Enable Adaptive Gradient Descent in Convolutional Neural Networks


Title	Homogeneous Vector Capsules Enable Adaptive Gradient Descent in Convolutional Neural Networks
Authors	Adam Byerly, Tatiana Kalganova
Abstract	Capsules are the name given by Geoffrey Hinton to vector-valued neurons. Neural networks traditionally produce a scalar value for an activated neuron. Capsules, on the other hand, produce a vector of values, which Hinton argues correspond to a single, composite feature wherein the values of the components of the vectors indicate properties of the feature such as transformation or contrast. We present a new way of parameterizing and training capsules that we refer to as homogeneous vector capsules (HVCs). We demonstrate, experimentally, that altering a convolutional neural network (CNN) to use HVCs can achieve superior classification accuracy without increasing the number of parameters or operations in its architecture as compared to a CNN using a single final fully connected layer. Additionally, the introduction of HVCs enables the use of adaptive gradient descent, reducing the dependence a model’s achievable accuracy has on the finely tuned hyperparameters of a non-adaptive optimizer. We demonstrate our method and results using two neural network architectures. First, a very simple monolithic CNN that when using HVCs achieved a 63% improvement in top-1 classification accuracy and a 35% improvement in top-5 classification accuracy over the baseline architecture. Second, with the CNN architecture referred to as Inception v3 that achieved similar accuracies both with and without HVCs. Additionally, the simple monolithic CNN when using HVCs showed no overfitting after more than 300 epochs whereas the baseline showed overfitting after 30 epochs. We use the ImageNet ILSVRC 2012 classification challenge dataset with both networks.
Tasks
Published	2019-06-20
URL	https://arxiv.org/abs/1906.08676v1
PDF	https://arxiv.org/pdf/1906.08676v1.pdf
PWC	https://paperswithcode.com/paper/homogeneous-vector-capsules-enable-adaptive
Repo	https://github.com/AdamByerly/HVCsEnableAGD
Framework	tf

LakhNES: Improving multi-instrumental music generation with cross-domain pre-training


Title	LakhNES: Improving multi-instrumental music generation with cross-domain pre-training
Authors	Chris Donahue, Huanru Henry Mao, Yiting Ethan Li, Garrison W. Cottrell, Julian McAuley
Abstract	We are interested in the task of generating multi-instrumental music scores. The Transformer architecture has recently shown great promise for the task of piano score generation; here we adapt it to the multi-instrumental setting. Transformers are complex, high-dimensional language models which are capable of capturing long-term structure in sequence data, but require large amounts of data to fit. Their success on piano score generation is partially explained by the large volumes of symbolic data readily available for that domain. We leverage the recently-introduced NES-MDB dataset of four-instrument scores from an early video game sound synthesis chip (the NES), which we find to be well-suited to training with the Transformer architecture. To further improve the performance of our model, we propose a pre-training technique to leverage the information in a large collection of heterogeneous music, namely the Lakh MIDI dataset. Despite differences between the two corpora, we find that this transfer learning procedure improves both quantitative and qualitative performance for our primary task.
Tasks	Music Generation, Transfer Learning
Published	2019-07-10
URL	https://arxiv.org/abs/1907.04868v1
PDF	https://arxiv.org/pdf/1907.04868v1.pdf
PWC	https://paperswithcode.com/paper/lakhnes-improving-multi-instrumental-music
Repo	https://github.com/chrisdonahue/LakhNES
Framework	pytorch