Paper Group AWR 99
BiasedWalk: Biased Sampling for Representation Learning on Graphs. Stacked Cross Attention for Image-Text Matching. Music Mood Detection Based On Audio And Lyrics With Deep Neural Net. Response Ranking with Deep Matching Networks and External Knowledge in Information-seeking Conversation Systems. Big-Little Net: An Efficient Multi-Scale Feature Rep …
BiasedWalk: Biased Sampling for Representation Learning on Graphs
Title | BiasedWalk: Biased Sampling for Representation Learning on Graphs |
Authors | Duong Nguyen, Fragkiskos D. Malliaros |
Abstract | Network embedding algorithms are able to learn latent feature representations of nodes, transforming networks into lower dimensional vector representations. Typical key applications, which have effectively been addressed using network embeddings, include link prediction, multilabel classification and community detection. In this paper, we propose BiasedWalk, a scalable, unsupervised feature learning algorithm that is based on biased random walks to sample context information about each node in the network. Our random-walk based sampling can behave as Breath-First-Search (BFS) and Depth-First-Search (DFS) samplings with the goal to capture homophily and role equivalence between the nodes in the network. We have performed a detailed experimental evaluation comparing the performance of the proposed algorithm against various baseline methods, on several datasets and learning tasks. The experiment results show that the proposed method outperforms the baseline ones in most of the tasks and datasets. |
Tasks | Community Detection, Link Prediction, Network Embedding, Node Classification, Representation Learning |
Published | 2018-09-07 |
URL | http://arxiv.org/abs/1809.02482v1 |
http://arxiv.org/pdf/1809.02482v1.pdf | |
PWC | https://paperswithcode.com/paper/biasedwalk-biased-sampling-for-representation |
Repo | https://github.com/duong18/BiasedWalk |
Framework | none |
Stacked Cross Attention for Image-Text Matching
Title | Stacked Cross Attention for Image-Text Matching |
Authors | Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, Xiaodong He |
Abstract | In this paper, we study the problem of image-text matching. Inferring the latent semantic alignment between objects or other salient stuff (e.g. snow, sky, lawn) and the corresponding words in sentences allows to capture fine-grained interplay between vision and language, and makes image-text matching more interpretable. Prior work either simply aggregates the similarity of all possible pairs of regions and words without attending differentially to more and less important words or regions, or uses a multi-step attentional process to capture limited number of semantic alignments which is less interpretable. In this paper, we present Stacked Cross Attention to discover the full latent alignments using both image regions and words in a sentence as context and infer image-text similarity. Our approach achieves the state-of-the-art results on the MS-COCO and Flickr30K datasets. On Flickr30K, our approach outperforms the current best methods by 22.1% relatively in text retrieval from image query, and 18.2% relatively in image retrieval with text query (based on Recall@1). On MS-COCO, our approach improves sentence retrieval by 17.8% relatively and image retrieval by 16.6% relatively (based on Recall@1 using the 5K test set). Code has been made available at: https://github.com/kuanghuei/SCAN. |
Tasks | Image Retrieval, Text Matching |
Published | 2018-03-21 |
URL | http://arxiv.org/abs/1803.08024v2 |
http://arxiv.org/pdf/1803.08024v2.pdf | |
PWC | https://paperswithcode.com/paper/stacked-cross-attention-for-image-text |
Repo | https://github.com/kuanghuei/SCAN |
Framework | pytorch |
Music Mood Detection Based On Audio And Lyrics With Deep Neural Net
Title | Music Mood Detection Based On Audio And Lyrics With Deep Neural Net |
Authors | Rémi Delbouys, Romain Hennequin, Francesco Piccoli, Jimena Royo-Letelier, Manuel Moussallam |
Abstract | We consider the task of multimodal music mood prediction based on the audio signal and the lyrics of a track. We reproduce the implementation of traditional feature engineering based approaches and propose a new model based on deep learning. We compare the performance of both approaches on a database containing 18,000 tracks with associated valence and arousal values and show that our approach outperforms classical models on the arousal detection task, and that both approaches perform equally on the valence prediction task. We also compare the a posteriori fusion with fusion of modalities optimized simultaneously with each unimodal model, and observe a significant improvement of valence prediction. We release part of our database for comparison purposes. |
Tasks | Multimodal Emotion Recognition, Music Emotion Recognition |
Published | 2018-09-19 |
URL | http://arxiv.org/abs/1809.07276v1 |
http://arxiv.org/pdf/1809.07276v1.pdf | |
PWC | https://paperswithcode.com/paper/music-mood-detection-based-on-audio-and |
Repo | https://github.com/Dohppak/Music-Emotion-Recognition-Classification |
Framework | pytorch |
Response Ranking with Deep Matching Networks and External Knowledge in Information-seeking Conversation Systems
Title | Response Ranking with Deep Matching Networks and External Knowledge in Information-seeking Conversation Systems |
Authors | Liu Yang, Minghui Qiu, Chen Qu, Jiafeng Guo, Yongfeng Zhang, W. Bruce Croft, Jun Huang, Haiqing Chen |
Abstract | Intelligent personal assistant systems with either text-based or voice-based conversational interfaces are becoming increasingly popular around the world. Retrieval-based conversation models have the advantages of returning fluent and informative responses. Most existing studies in this area are on open domain “chit-chat” conversations or task / transaction oriented conversations. More research is needed for information-seeking conversations. There is also a lack of modeling external knowledge beyond the dialog utterances among current conversational models. In this paper, we propose a learning framework on the top of deep neural matching networks that leverages external knowledge for response ranking in information-seeking conversation systems. We incorporate external knowledge into deep neural models with pseudo-relevance feedback and QA correspondence knowledge distillation. Extensive experiments with three information-seeking conversation data sets including both open benchmarks and commercial data show that, our methods outperform various baseline methods including several deep text matching models and the state-of-the-art method on response selection in multi-turn conversations. We also perform analysis over different response types, model variations and ranking examples. Our models and research findings provide new insights on how to utilize external knowledge with deep neural models for response selection and have implications for the design of the next generation of information-seeking conversation systems. |
Tasks | Text Matching |
Published | 2018-05-01 |
URL | http://arxiv.org/abs/1805.00188v3 |
http://arxiv.org/pdf/1805.00188v3.pdf | |
PWC | https://paperswithcode.com/paper/response-ranking-with-deep-matching-networks |
Repo | https://github.com/yangliuy/NeuralResponseRanking |
Framework | tf |
Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition
Title | Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition |
Authors | Chun-Fu Chen, Quanfu Fan, Neil Mallinar, Tom Sercu, Rogerio Feris |
Abstract | In this paper, we propose a novel Convolutional Neural Network (CNN) architecture for learning multi-scale feature representations with good tradeoffs between speed and accuracy. This is achieved by using a multi-branch network, which has different computational complexity at different branches. Through frequent merging of features from branches at distinct scales, our model obtains multi-scale features while using less computation. The proposed approach demonstrates improvement of model efficiency and performance on both object recognition and speech recognition tasks,using popular architectures including ResNet and ResNeXt. For object recognition, our approach reduces computation by 33% on object recognition while improving accuracy with 0.9%. Furthermore, our model surpasses state-of-the-art CNN acceleration approaches by a large margin in accuracy and FLOPs reduction. On the task of speech recognition, our proposed multi-scale CNNs save 30% FLOPs with slightly better word error rates, showing good generalization across domains. The codes are available at https://github.com/IBM/BigLittleNet |
Tasks | Object Recognition, Speech Recognition |
Published | 2018-07-10 |
URL | https://arxiv.org/abs/1807.03848v3 |
https://arxiv.org/pdf/1807.03848v3.pdf | |
PWC | https://paperswithcode.com/paper/big-little-net-an-efficient-multi-scale |
Repo | https://github.com/k0pch4/big-little-net |
Framework | pytorch |
NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph Embedding
Title | NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph Embedding |
Authors | Yongqi Zhang, Quanming Yao, Yingxia Shao, Lei Chen |
Abstract | Knowledge Graph (KG) embedding is a fundamental problem in data mining research with many real-world applications. It aims to encode the entities and relations in the graph into low dimensional vector space, which can be used for subsequent algorithms. Negative sampling, which samples negative triplets from non-observed ones in the training data, is an important step in KG embedding. Recently, generative adversarial network (GAN), has been introduced in negative sampling. By sampling negative triplets with large scores, these methods avoid the problem of vanishing gradient and thus obtain better performance. However, using GAN makes the original model more complex and hard to train, where reinforcement learning must be used. In this paper, motivated by the observation that negative triplets with large scores are important but rare, we propose to directly keep track of them with the cache. However, how to sample from and update the cache are two important questions. We carefully design the solutions, which are not only efficient but also achieve a good balance between exploration and exploitation. In this way, our method acts as a “distilled” version of previous GA-based methods, which does not waste training time on additional parameters to fit the full distribution of negative triplets. The extensive experiments show that our method can gain significant improvement in various KG embedding models, and outperform the state-of-the-art negative sampling methods based on GAN. |
Tasks | Graph Embedding, Knowledge Graph Embedding |
Published | 2018-12-16 |
URL | http://arxiv.org/abs/1812.06410v2 |
http://arxiv.org/pdf/1812.06410v2.pdf | |
PWC | https://paperswithcode.com/paper/nscaching-simple-and-efficient-negative |
Repo | https://github.com/yzhangee/NSCaching |
Framework | pytorch |
Learning End-to-end Autonomous Driving using Guided Auxiliary Supervision
Title | Learning End-to-end Autonomous Driving using Guided Auxiliary Supervision |
Authors | Ashish Mehta, Adithya Subramanian, Anbumani Subramanian |
Abstract | Learning to drive faithfully in highly stochastic urban settings remains an open problem. To that end, we propose a Multi-task Learning from Demonstration (MT-LfD) framework which uses supervised auxiliary task prediction to guide the main task of predicting the driving commands. Our framework involves an end-to-end trainable network for imitating the expert demonstrator’s driving commands. The network intermediately predicts visual affordances and action primitives through direct supervision which provide the aforementioned auxiliary supervised guidance. We demonstrate that such joint learning and supervised guidance facilitates hierarchical task decomposition, assisting the agent to learn faster, achieve better driving performance and increases transparency of the otherwise black-box end-to-end network. We run our experiments to validate the MT-LfD framework in CARLA, an open-source urban driving simulator. We introduce multiple non-player agents in CARLA and induce temporal noise in them for realistic stochasticity. |
Tasks | Autonomous Driving, Multi-Task Learning |
Published | 2018-08-30 |
URL | http://arxiv.org/abs/1808.10393v1 |
http://arxiv.org/pdf/1808.10393v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-end-to-end-autonomous-driving-using |
Repo | https://github.com/AshishMehtaIO/MTLfD-CARLA |
Framework | tf |
A Brief Review of Real-World Color Image Denoising
Title | A Brief Review of Real-World Color Image Denoising |
Authors | Zhaoming Kong, Xiaowei Yang |
Abstract | Filtering real-world color images is challenging due to the complexity of noise that can not be formulated as a certain distribution. However, the rapid development of camera lens pos- es greater demands on image denoising in terms of both efficiency and effectiveness. Currently, the most widely accepted framework employs the combination of transform domain techniques and nonlocal similarity characteristics of natural images. Based on this framework, many competitive methods model the correlation of R, G, B channels with pre-defined or adaptively learned transforms. In this chapter, a brief review of related methods and publicly available datasets is presented, moreover, a new dataset that includes more natural outdoor scenes is introduced. Extensive experiments are performed and discussion on visual effect enhancement is included. |
Tasks | Denoising, Image Denoising |
Published | 2018-09-10 |
URL | http://arxiv.org/abs/1809.03298v1 |
http://arxiv.org/pdf/1809.03298v1.pdf | |
PWC | https://paperswithcode.com/paper/a-brief-review-of-real-world-color-image |
Repo | https://github.com/ZhaomingKong/Pure_Image |
Framework | none |
Deep Priority Hashing
Title | Deep Priority Hashing |
Authors | Zhangjie Cao, Ziping Sun, Mingsheng Long, Jianmin Wang, Philip S. Yu |
Abstract | Deep hashing enables image retrieval by end-to-end learning of deep representations and hash codes from training data with pairwise similarity information. Subject to the distribution skewness underlying the similarity information, most existing deep hashing methods may underperform for imbalanced data due to misspecified loss functions. This paper presents Deep Priority Hashing (DPH), an end-to-end architecture that generates compact and balanced hash codes in a Bayesian learning framework. The main idea is to reshape the standard cross-entropy loss for similarity-preserving learning such that it down-weighs the loss associated to highly-confident pairs. This idea leads to a novel priority cross-entropy loss, which prioritizes the training on uncertain pairs over confident pairs. Also, we propose another priority quantization loss, which prioritizes hard-to-quantize examples for generation of nearly lossless hash codes. Extensive experiments demonstrate that DPH can generate high-quality hash codes and yield state-of-the-art image retrieval results on three datasets, ImageNet, NUS-WIDE, and MS-COCO. |
Tasks | Image Retrieval, Quantization |
Published | 2018-09-04 |
URL | http://arxiv.org/abs/1809.01238v1 |
http://arxiv.org/pdf/1809.01238v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-priority-hashing |
Repo | https://github.com/thuml/DPH |
Framework | none |
Fully Convolutional Pixel Adaptive Image Denoiser
Title | Fully Convolutional Pixel Adaptive Image Denoiser |
Authors | Sungmin Cha, Taesup Moon |
Abstract | We propose a new image denoising algorithm, dubbed as Fully Convolutional Adaptive Image DEnoiser (FC-AIDE), that can learn from an offline supervised training set with a fully convolutional neural network as well as adaptively fine-tune the supervised model for each given noisy image. We significantly extend the framework of the recently proposed Neural AIDE, which formulates the denoiser to be context-based pixelwise mappings and utilizes the unbiased estimator of MSE for such denoisers. The two main contributions we make are; 1) implementing a novel fully convolutional architecture that boosts the base supervised model, and 2) introducing regularization methods for the adaptive fine-tuning such that a stronger and more robust adaptivity can be attained. As a result, FC-AIDE is shown to possess many desirable features; it outperforms the recent CNN-based state-of-the-art denoisers on all of the benchmark datasets we tested, and gets particularly strong for various challenging scenarios, e.g., with mismatched image/noise characteristics or with scarce supervised training data. The source code of our algorithm is available at https://github.com/csm9493/FC-AIDE-Keras. |
Tasks | Denoising, Image Denoising |
Published | 2018-07-19 |
URL | https://arxiv.org/abs/1807.07569v4 |
https://arxiv.org/pdf/1807.07569v4.pdf | |
PWC | https://paperswithcode.com/paper/fully-convolutional-pixel-adaptive-image |
Repo | https://github.com/csm9493/FC-AIDE |
Framework | tf |
Super-Resolution via Image-Adapted Denoising CNNs: Incorporating External and Internal Learning
Title | Super-Resolution via Image-Adapted Denoising CNNs: Incorporating External and Internal Learning |
Authors | Tom Tirer, Raja Giryes |
Abstract | While deep neural networks exhibit state-of-the-art results in the task of image super-resolution (SR) with a fixed known acquisition process (e.g., a bicubic downscaling kernel), they experience a huge performance loss when the real observation model mismatches the one used in training. Recently, two different techniques suggested to mitigate this deficiency, i.e., enjoy the advantages of deep learning without being restricted by the training phase. The first one follows the plug-and-play (P&P) approach that solves general inverse problems (e.g., SR) by using Gaussian denoisers for handling the prior term in model-based optimization schemes. The second builds on internal recurrence of information inside a single image, and trains a super-resolver network at test time on examples synthesized from the low-resolution image. Our work incorporates these two independent strategies, enjoying the impressive generalization capabilities of deep learning, captured by the first, and further improving it through internal learning at test time. First, we apply a recent P&P strategy to SR. Then, we show how it may become image-adaptive in test time. This technique outperforms the above two strategies on popular datasets and gives better results than other state-of-the-art methods in practical cases where the observation model is inexact or unknown in advance. |
Tasks | Denoising, Image Super-Resolution, Super-Resolution |
Published | 2018-11-30 |
URL | https://arxiv.org/abs/1811.12866v3 |
https://arxiv.org/pdf/1811.12866v3.pdf | |
PWC | https://paperswithcode.com/paper/super-resolution-based-on-image-adapted-cnn |
Repo | https://github.com/tomtirer/IDBP-CNN-IA |
Framework | none |
Improved Deep Spectral Convolution Network For Hyperspectral Unmixing With Multinomial Mixture Kernel and Endmember Uncertainty
Title | Improved Deep Spectral Convolution Network For Hyperspectral Unmixing With Multinomial Mixture Kernel and Endmember Uncertainty |
Authors | Savas Ozkan, Gozde Bozdagi Akar |
Abstract | In this study, we propose a novel framework for hyperspectral unmixing by using an improved deep spectral convolution network (DSCN++) combined with endmember uncertainty. DSCN++ is used to compute high-level representations which are further modeled with Multinomial Mixture Model to estimate abundance maps. In the reconstruction step, a new trainable uncertainty term based on a nonlinear neural network model is introduced to provide robustness to endmember uncertainty. For the optimization of the coefficients of the multinomial model and the uncertainty term, Wasserstein Generative Adversarial Network (WGAN) is exploited to improve stability and to capture uncertainty. Experiments are performed on both real and synthetic datasets. The results validate that the proposed method obtains state-of-the-art hyperspectral unmixing performance particularly on the real datasets compared to the baseline techniques. |
Tasks | Hyperspectral Unmixing |
Published | 2018-08-03 |
URL | https://arxiv.org/abs/1808.01104v4 |
https://arxiv.org/pdf/1808.01104v4.pdf | |
PWC | https://paperswithcode.com/paper/improved-deep-spectral-convolution-network |
Repo | https://github.com/savasozkan/dscn |
Framework | tf |
Online Multiclass Boosting with Bandit Feedback
Title | Online Multiclass Boosting with Bandit Feedback |
Authors | Daniel T. Zhang, Young Hun Jung, Ambuj Tewari |
Abstract | We present online boosting algorithms for multiclass classification with bandit feedback, where the learner only receives feedback about the correctness of its prediction. We propose an unbiased estimate of the loss using a randomized prediction, allowing the model to update its weak learners with limited information. Using the unbiased estimate, we extend two full information boosting algorithms (Jung et al., 2017) to the bandit setting. We prove that the asymptotic error bounds of the bandit algorithms exactly match their full information counterparts. The cost of restricted feedback is reflected in the larger sample complexity. Experimental results also support our theoretical findings, and performance of the proposed models is comparable to that of an existing bandit boosting algorithm, which is limited to use binary weak learners. |
Tasks | |
Published | 2018-10-11 |
URL | http://arxiv.org/abs/1810.05290v2 |
http://arxiv.org/pdf/1810.05290v2.pdf | |
PWC | https://paperswithcode.com/paper/online-multiclass-boosting-with-bandit |
Repo | https://github.com/pi224/banditboosting |
Framework | none |
Label-Noise Robust Generative Adversarial Networks
Title | Label-Noise Robust Generative Adversarial Networks |
Authors | Takuhiro Kaneko, Yoshitaka Ushiku, Tatsuya Harada |
Abstract | Generative adversarial networks (GANs) are a framework that learns a generative distribution through adversarial training. Recently, their class-conditional extensions (e.g., conditional GAN (cGAN) and auxiliary classifier GAN (AC-GAN)) have attracted much attention owing to their ability to learn the disentangled representations and to improve the training stability. However, their training requires the availability of large-scale accurate class-labeled data, which are often laborious or impractical to collect in a real-world scenario. To remedy this, we propose a novel family of GANs called label-noise robust GANs (rGANs), which, by incorporating a noise transition model, can learn a clean label conditional generative distribution even when training labels are noisy. In particular, we propose two variants: rAC-GAN, which is a bridging model between AC-GAN and the label-noise robust classification model, and rcGAN, which is an extension of cGAN and solves this problem with no reliance on any classifier. In addition to providing the theoretical background, we demonstrate the effectiveness of our models through extensive experiments using diverse GAN configurations, various noise settings, and multiple evaluation metrics (in which we tested 402 conditions in total). Our code is available at https://github.com/takuhirok/rGAN/. |
Tasks | |
Published | 2018-11-27 |
URL | https://arxiv.org/abs/1811.11165v2 |
https://arxiv.org/pdf/1811.11165v2.pdf | |
PWC | https://paperswithcode.com/paper/label-noise-robust-generative-adversarial |
Repo | https://github.com/takuhirok/NR-GAN |
Framework | pytorch |
Deep Spectral Convolution Network for HyperSpectral Unmixing
Title | Deep Spectral Convolution Network for HyperSpectral Unmixing |
Authors | Savas Ozkan, Gozde Bozdagi Akar |
Abstract | In this paper, we propose a novel hyperspectral unmixing technique based on deep spectral convolution networks (DSCN). Particularly, three important contributions are presented throughout this paper. First, fully-connected linear operation is replaced with spectral convolutions to extract local spectral characteristics from hyperspectral signatures with a deeper network architecture. Second, instead of batch normalization, we propose a spectral normalization layer which improves the selectivity of filters by normalizing their spectral responses. Third, we introduce two fusion configurations that produce ideal abundance maps by using the abstract representations computed from previous layers. In experiments, we use two real datasets to evaluate the performance of our method with other baseline techniques. The experimental results validate that the proposed method outperforms baselines based on Root Mean Square Error (RMSE). |
Tasks | Hyperspectral Unmixing |
Published | 2018-06-22 |
URL | http://arxiv.org/abs/1806.08562v1 |
http://arxiv.org/pdf/1806.08562v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-spectral-convolution-network-for |
Repo | https://github.com/savasozkan/dscn |
Framework | tf |