Paper Group AWR 50
Creativity Inspired Zero-Shot Learning. Towards Finding Longer Proofs. Learning to Cluster Faces on an Affinity Graph. DensePoint: Learning Densely Contextual Representation for Efficient Point Cloud Processing. Guiding Theorem Proving by Recurrent Neural Networks. UWGAN: Underwater GAN for Real-world Underwater Color Restoration and Dehazing. Vide …
Creativity Inspired Zero-Shot Learning
Title | Creativity Inspired Zero-Shot Learning |
Authors | Mohamed Elhoseiny, Mohamed Elfeki |
Abstract | Zero-shot learning (ZSL) aims at understanding unseen categories with no training examples from class-level descriptions. To improve the discriminative power of zero-shot learning, we model the visual learning process of unseen categories with inspiration from the psychology of human creativity for producing novel art. We relate ZSL to human creativity by observing that zero-shot learning is about recognizing the unseen and creativity is about creating a likable unseen. We introduce a learning signal inspired by creativity literature that explores the unseen space with hallucinated class-descriptions and encourages careful deviation of their visual feature generations from seen classes while allowing knowledge transfer from seen to unseen classes. Empirically, we show consistent improvement over the state of the art of several percents on the largest available benchmarks on the challenging task or generalized ZSL from a noisy text that we focus on, using the CUB and NABirds datasets. We also show the advantage of our approach on Attribute-based ZSL on three additional datasets (AwA2, aPY, and SUN). Code is available. |
Tasks | Transfer Learning, Zero-Shot Learning |
Published | 2019-04-01 |
URL | https://arxiv.org/abs/1904.01109v6 |
https://arxiv.org/pdf/1904.01109v6.pdf | |
PWC | https://paperswithcode.com/paper/creativity-inspired-zero-shot-learning |
Repo | https://github.com/mhelhoseiny/CIZSL |
Framework | pytorch |
Towards Finding Longer Proofs
Title | Towards Finding Longer Proofs |
Authors | Zsolt Zombori, Adrián Csiszárik, Henryk Michalewski, Cezary Kaliszyk, Josef Urban |
Abstract | We present a reinforcement learning (RL) based guidance system for automated theorem proving geared towards Finding Longer Proofs (FLoP). FLoP focuses on generalizing from short proofs to longer ones of similar structure. To achieve that, FLoP uses state-of-the-art RL approaches that were previously not applied in theorem proving. In particular, we show that curriculum learning significantly outperforms previous learning-based proof guidance on a synthetic dataset of increasingly difficult arithmetic problems. |
Tasks | Automated Theorem Proving |
Published | 2019-05-30 |
URL | https://arxiv.org/abs/1905.13100v1 |
https://arxiv.org/pdf/1905.13100v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-finding-longer-proofs |
Repo | https://github.com/atpcurr/atpcurr |
Framework | none |
Learning to Cluster Faces on an Affinity Graph
Title | Learning to Cluster Faces on an Affinity Graph |
Authors | Lei Yang, Xiaohang Zhan, Dapeng Chen, Junjie Yan, Chen Change Loy, Dahua Lin |
Abstract | Face recognition sees remarkable progress in recent years, and its performance has reached a very high level. Taking it to a next level requires substantially larger data, which would involve prohibitive annotation cost. Hence, exploiting unlabeled data becomes an appealing alternative. Recent works have shown that clustering unlabeled faces is a promising approach, often leading to notable performance gains. Yet, how to effectively cluster, especially on a large-scale (i.e. million-level or above) dataset, remains an open question. A key challenge lies in the complex variations of cluster patterns, which make it difficult for conventional clustering methods to meet the needed accuracy. This work explores a novel approach, namely, learning to cluster instead of relying on hand-crafted criteria. Specifically, we propose a framework based on graph convolutional network, which combines a detection and a segmentation module to pinpoint face clusters. Experiments show that our method yields significantly more accurate face clusters, which, as a result, also lead to further performance gain in face recognition. |
Tasks | Face Recognition |
Published | 2019-04-04 |
URL | https://arxiv.org/abs/1904.02749v2 |
https://arxiv.org/pdf/1904.02749v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-cluster-faces-on-an-affinity |
Repo | https://github.com/yl-1993/learn-to-cluster |
Framework | pytorch |
DensePoint: Learning Densely Contextual Representation for Efficient Point Cloud Processing
Title | DensePoint: Learning Densely Contextual Representation for Efficient Point Cloud Processing |
Authors | Yongcheng Liu, Bin Fan, Gaofeng Meng, Jiwen Lu, Shiming Xiang, Chunhong Pan |
Abstract | Point cloud processing is very challenging, as the diverse shapes formed by irregular points are often indistinguishable. A thorough grasp of the elusive shape requires sufficiently contextual semantic information, yet few works devote to this. Here we propose DensePoint, a general architecture to learn densely contextual representation for point cloud processing. Technically, it extends regular grid CNN to irregular point configuration by generalizing a convolution operator, which holds the permutation invariance of points, and achieves efficient inductive learning of local patterns. Architecturally, it finds inspiration from dense connection mode, to repeatedly aggregate multi-level and multi-scale semantics in a deep hierarchy. As a result, densely contextual information along with rich semantics, can be acquired by DensePoint in an organic manner, making it highly effective. Extensive experiments on challenging benchmarks across four tasks, as well as thorough model analysis, verify DensePoint achieves the state of the arts. |
Tasks | |
Published | 2019-09-09 |
URL | https://arxiv.org/abs/1909.03669v1 |
https://arxiv.org/pdf/1909.03669v1.pdf | |
PWC | https://paperswithcode.com/paper/densepoint-learning-densely-contextual |
Repo | https://github.com/Yochengliu/DensePoint |
Framework | pytorch |
Guiding Theorem Proving by Recurrent Neural Networks
Title | Guiding Theorem Proving by Recurrent Neural Networks |
Authors | Bartosz Piotrowski, Josef Urban |
Abstract | We describe two theorem proving tasks – premise selection and internal guidance – for which machine learning has been recently used with some success. We argue that the existing methods however do not correspond to the way how humans approach these tasks. In particular, the existing methods so far lack the notion of a state that is updated each time a choice in the reasoning process is made. To address that, we propose an analogy with tasks such as machine translation, where stateful architectures such as recurrent neural networks have been recently very successful. Then we develop and publish a series of sequence-to-sequence data sets that correspond to the theorem proving tasks using several encodings, and provide the first experimental evaluation of the performance of recurrent neural networks on such tasks. |
Tasks | Automated Theorem Proving, Machine Translation |
Published | 2019-05-20 |
URL | https://arxiv.org/abs/1905.07961v1 |
https://arxiv.org/pdf/1905.07961v1.pdf | |
PWC | https://paperswithcode.com/paper/guiding-theorem-proving-by-recurrent-neural |
Repo | https://github.com/BartoszPiotrowski/rnn-for-proving-data |
Framework | none |
UWGAN: Underwater GAN for Real-world Underwater Color Restoration and Dehazing
Title | UWGAN: Underwater GAN for Real-world Underwater Color Restoration and Dehazing |
Authors | Nan Wang, Yabin Zhou, Fenglei Han, Haitao Zhu, Yaojing Zheng |
Abstract | In real-world underwater environment, exploration of seabed resources, underwater archaeology, and underwater fishing rely on a variety of sensors, vision sensor is the most important one due to its high information content, non-intrusive, and passive nature. However, wavelength-dependent light attenuation and back-scattering result in color distortion and haze effect, which degrade the visibility of images. To address this problem, firstly, we proposed an unsupervised generative adversarial network (GAN) for generating realistic underwater images (color distortion and haze effect) from in-air image and depth map pairs based on improved underwater imaging model. Secondly, U-Net, which is trained efficiently using synthetic underwater dataset, is adopted for color restoration and dehazing. Our model directly reconstructs underwater clear images using end-to-end autoencoder networks, while maintaining scene content structural similarity. The results obtained by our method were compared with existing methods qualitatively and quantitatively. Experimental results obtained by the proposed model demonstrate well performance on open real-world underwater datasets, and the processing speed can reach up to 125FPS running on one NVIDIA 1060 GPU. Source code, sample datasets are made publicly available at https://github.com/infrontofme/UWGAN_UIE. |
Tasks | |
Published | 2019-12-21 |
URL | https://arxiv.org/abs/1912.10269v1 |
https://arxiv.org/pdf/1912.10269v1.pdf | |
PWC | https://paperswithcode.com/paper/uwgan-underwater-gan-for-real-world-1 |
Repo | https://github.com/infrontofme/UWGAN_UIE |
Framework | tf |
Video Face Clustering with Unknown Number of Clusters
Title | Video Face Clustering with Unknown Number of Clusters |
Authors | Makarand Tapaswi, Marc T. Law, Sanja Fidler |
Abstract | Understanding videos such as TV series and movies requires analyzing who the characters are and what they are doing. We address the challenging problem of clustering face tracks based on their identity. Different from previous work in this area, we choose to operate in a realistic and difficult setting where: (i) the number of characters is not known a priori; and (ii) face tracks belonging to minor or background characters are not discarded. To this end, we propose Ball Cluster Learning (BCL), a supervised approach to carve the embedding space into balls of equal size, one for each cluster. The learned ball radius is easily translated to a stopping criterion for iterative merging algorithms. This gives BCL the ability to estimate the number of clusters as well as their assignment, achieving promising results on commonly used datasets. We also present a thorough discussion of how existing metric learning literature can be adapted for this task. |
Tasks | Metric Learning |
Published | 2019-08-09 |
URL | https://arxiv.org/abs/1908.03381v2 |
https://arxiv.org/pdf/1908.03381v2.pdf | |
PWC | https://paperswithcode.com/paper/video-face-clustering-with-unknown-number-of |
Repo | https://github.com/makarandtapaswi/BallClustering_ICCV2019 |
Framework | pytorch |
Metric Learning With HORDE: High-Order Regularizer for Deep Embeddings
Title | Metric Learning With HORDE: High-Order Regularizer for Deep Embeddings |
Authors | Pierre Jacob, David Picard, Aymeric Histace, Edouard Klein |
Abstract | Learning an effective similarity measure between image representations is key to the success of recent advances in visual search tasks (e.g. verification or zero-shot learning). Although the metric learning part is well addressed, this metric is usually computed over the average of the extracted deep features. This representation is then trained to be discriminative. However, these deep features tend to be scattered across the feature space. Consequently, the representations are not robust to outliers, object occlusions, background variations, etc. In this paper, we tackle this scattering problem with a distribution-aware regularization named HORDE. This regularizer enforces visually-close images to have deep features with the same distribution which are well localized in the feature space. We provide a theoretical analysis supporting this regularization effect. We also show the effectiveness of our approach by obtaining state-of-the-art results on 4 well-known datasets (Cub-200-2011, Cars-196, Stanford Online Products and Inshop Clothes Retrieval). |
Tasks | Image Retrieval, Metric Learning |
Published | 2019-08-07 |
URL | https://arxiv.org/abs/1908.02735v1 |
https://arxiv.org/pdf/1908.02735v1.pdf | |
PWC | https://paperswithcode.com/paper/metric-learning-with-horde-high-order |
Repo | https://github.com/pierre-jacob/ICCV2019-Horde |
Framework | tf |
A Crowd-based Evaluation of Abuse Response Strategies in Conversational Agents
Title | A Crowd-based Evaluation of Abuse Response Strategies in Conversational Agents |
Authors | Amanda Cercas Curry, Verena Rieser |
Abstract | How should conversational agents respond to verbal abuse through the user? To answer this question, we conduct a large-scale crowd-sourced evaluation of abuse response strategies employed by current state-of-the-art systems. Our results show that some strategies, such as “polite refusal” score highly across the board, while for other strategies demographic factors, such as age, as well as the severity of the preceding abuse influence the user’s perception of which response is appropriate. In addition, we find that most data-driven models lag behind rule-based or commercial systems in terms of their perceived appropriateness. |
Tasks | |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04387v1 |
https://arxiv.org/pdf/1909.04387v1.pdf | |
PWC | https://paperswithcode.com/paper/a-crowd-based-evaluation-of-abuse-response |
Repo | https://github.com/amandacurry/metoo_corpus |
Framework | none |
A Wrapped Normal Distribution on Hyperbolic Space for Gradient-Based Learning
Title | A Wrapped Normal Distribution on Hyperbolic Space for Gradient-Based Learning |
Authors | Yoshihiro Nagano, Shoichiro Yamaguchi, Yasuhiro Fujita, Masanori Koyama |
Abstract | Hyperbolic space is a geometry that is known to be well-suited for representation learning of data with an underlying hierarchical structure. In this paper, we present a novel hyperbolic distribution called \textit{pseudo-hyperbolic Gaussian}, a Gaussian-like distribution on hyperbolic space whose density can be evaluated analytically and differentiated with respect to the parameters. Our distribution enables the gradient-based learning of the probabilistic models on hyperbolic space that could never have been considered before. Also, we can sample from this hyperbolic probability distribution without resorting to auxiliary means like rejection sampling. As applications of our distribution, we develop a hyperbolic-analog of variational autoencoder and a method of probabilistic word embedding on hyperbolic space. We demonstrate the efficacy of our distribution on various datasets including MNIST, Atari 2600 Breakout, and WordNet. |
Tasks | Representation Learning |
Published | 2019-02-08 |
URL | https://arxiv.org/abs/1902.02992v2 |
https://arxiv.org/pdf/1902.02992v2.pdf | |
PWC | https://paperswithcode.com/paper/a-differentiable-gaussian-like-distribution |
Repo | https://github.com/muupan/resume |
Framework | none |
RGB-Infrared Cross-Modality Person Re-Identification via Joint Pixel and Feature Alignment
Title | RGB-Infrared Cross-Modality Person Re-Identification via Joint Pixel and Feature Alignment |
Authors | Guan’an Wang, Tianzhu Zhang, Jian Cheng, Si Liu, Yang Yang, Zengguang Hou |
Abstract | RGB-Infrared (IR) person re-identification is an important and challenging task due to large cross-modality variations between RGB and IR images. Most conventional approaches aim to bridge the cross-modality gap with feature alignment by feature representation learning. Different from existing methods, in this paper, we propose a novel and end-to-end Alignment Generative Adversarial Network (AlignGAN) for the RGB-IR RE-ID task. The proposed model enjoys several merits. First, it can exploit pixel alignment and feature alignment jointly. To the best of our knowledge, this is the first work to model the two alignment strategies jointly for the RGB-IR RE-ID problem. Second, the proposed model consists of a pixel generator, a feature generator, and a joint discriminator. By playing a min-max game among the three components, our model is able to not only alleviate the cross-modality and intra-modality variations but also learn identity-consistent features. Extensive experimental results on two standard benchmarks demonstrate that the proposed model performs favorably against state-of-the-art methods. Especially, on SYSU-MM01 dataset, our model can achieve an absolute gain of 15.4% and 12.9% in terms of Rank-1 and mAP. |
Tasks | Person Re-Identification, Representation Learning |
Published | 2019-10-13 |
URL | https://arxiv.org/abs/1910.05839v2 |
https://arxiv.org/pdf/1910.05839v2.pdf | |
PWC | https://paperswithcode.com/paper/rgb-infrared-cross-modality-person-re-1 |
Repo | https://github.com/wangguanan/AlignGAN |
Framework | pytorch |
Local Relation Networks for Image Recognition
Title | Local Relation Networks for Image Recognition |
Authors | Han Hu, Zheng Zhang, Zhenda Xie, Stephen Lin |
Abstract | The convolution layer has been the dominant feature extractor in computer vision for years. However, the spatial aggregation in convolution is basically a pattern matching process that applies fixed filters which are inefficient at modeling visual elements with varying spatial distributions. This paper presents a new image feature extractor, called the local relation layer, that adaptively determines aggregation weights based on the compositional relationship of local pixel pairs. With this relational approach, it can composite visual elements into higher-level entities in a more efficient manner that benefits semantic inference. A network built with local relation layers, called the Local Relation Network (LR-Net), is found to provide greater modeling capacity than its counterpart built with regular convolution on large-scale recognition tasks such as ImageNet classification. |
Tasks | |
Published | 2019-04-25 |
URL | http://arxiv.org/abs/1904.11491v1 |
http://arxiv.org/pdf/1904.11491v1.pdf | |
PWC | https://paperswithcode.com/paper/190411491 |
Repo | https://github.com/gan3sh500/local-relational-nets |
Framework | pytorch |
PerspectroScope: A Window to the World of Diverse Perspectives
Title | PerspectroScope: A Window to the World of Diverse Perspectives |
Authors | Sihao Chen, Daniel Khashabi, Chris Callison-Burch, Dan Roth |
Abstract | This work presents PerspectroScope, a web-based system which lets users query a discussion-worthy natural language claim, and extract and visualize various perspectives in support or against the claim, along with evidence supporting each perspective. The system thus lets users explore various perspectives that could touch upon aspects of the issue at hand.The system is built as a combination of retrieval engines and learned textual-entailment-like classifiers built using a few recent developments in natural language understanding. To make the system more adaptive, expand its coverage, and improve its decisions over time, our platform employs various mechanisms to get corrections from the users. PerspectroScope is available at github.com/CogComp/perspectroscope. |
Tasks | Natural Language Inference |
Published | 2019-06-11 |
URL | https://arxiv.org/abs/1906.04761v1 |
https://arxiv.org/pdf/1906.04761v1.pdf | |
PWC | https://paperswithcode.com/paper/perspectroscope-a-window-to-the-world-of |
Repo | https://github.com/CogComp/perspectroscope |
Framework | none |
Homogeneous Vector Capsules Enable Adaptive Gradient Descent in Convolutional Neural Networks
Title | Homogeneous Vector Capsules Enable Adaptive Gradient Descent in Convolutional Neural Networks |
Authors | Adam Byerly, Tatiana Kalganova |
Abstract | Capsules are the name given by Geoffrey Hinton to vector-valued neurons. Neural networks traditionally produce a scalar value for an activated neuron. Capsules, on the other hand, produce a vector of values, which Hinton argues correspond to a single, composite feature wherein the values of the components of the vectors indicate properties of the feature such as transformation or contrast. We present a new way of parameterizing and training capsules that we refer to as homogeneous vector capsules (HVCs). We demonstrate, experimentally, that altering a convolutional neural network (CNN) to use HVCs can achieve superior classification accuracy without increasing the number of parameters or operations in its architecture as compared to a CNN using a single final fully connected layer. Additionally, the introduction of HVCs enables the use of adaptive gradient descent, reducing the dependence a model’s achievable accuracy has on the finely tuned hyperparameters of a non-adaptive optimizer. We demonstrate our method and results using two neural network architectures. First, a very simple monolithic CNN that when using HVCs achieved a 63% improvement in top-1 classification accuracy and a 35% improvement in top-5 classification accuracy over the baseline architecture. Second, with the CNN architecture referred to as Inception v3 that achieved similar accuracies both with and without HVCs. Additionally, the simple monolithic CNN when using HVCs showed no overfitting after more than 300 epochs whereas the baseline showed overfitting after 30 epochs. We use the ImageNet ILSVRC 2012 classification challenge dataset with both networks. |
Tasks | |
Published | 2019-06-20 |
URL | https://arxiv.org/abs/1906.08676v1 |
https://arxiv.org/pdf/1906.08676v1.pdf | |
PWC | https://paperswithcode.com/paper/homogeneous-vector-capsules-enable-adaptive |
Repo | https://github.com/AdamByerly/HVCsEnableAGD |
Framework | tf |
LakhNES: Improving multi-instrumental music generation with cross-domain pre-training
Title | LakhNES: Improving multi-instrumental music generation with cross-domain pre-training |
Authors | Chris Donahue, Huanru Henry Mao, Yiting Ethan Li, Garrison W. Cottrell, Julian McAuley |
Abstract | We are interested in the task of generating multi-instrumental music scores. The Transformer architecture has recently shown great promise for the task of piano score generation; here we adapt it to the multi-instrumental setting. Transformers are complex, high-dimensional language models which are capable of capturing long-term structure in sequence data, but require large amounts of data to fit. Their success on piano score generation is partially explained by the large volumes of symbolic data readily available for that domain. We leverage the recently-introduced NES-MDB dataset of four-instrument scores from an early video game sound synthesis chip (the NES), which we find to be well-suited to training with the Transformer architecture. To further improve the performance of our model, we propose a pre-training technique to leverage the information in a large collection of heterogeneous music, namely the Lakh MIDI dataset. Despite differences between the two corpora, we find that this transfer learning procedure improves both quantitative and qualitative performance for our primary task. |
Tasks | Music Generation, Transfer Learning |
Published | 2019-07-10 |
URL | https://arxiv.org/abs/1907.04868v1 |
https://arxiv.org/pdf/1907.04868v1.pdf | |
PWC | https://paperswithcode.com/paper/lakhnes-improving-multi-instrumental-music |
Repo | https://github.com/chrisdonahue/LakhNES |
Framework | pytorch |