October 21, 2019

3071 words 15 mins read

Paper Group AWR 99

BiasedWalk: Biased Sampling for Representation Learning on Graphs. Stacked Cross Attention for Image-Text Matching. Music Mood Detection Based On Audio And Lyrics With Deep Neural Net. Response Ranking with Deep Matching Networks and External Knowledge in Information-seeking Conversation Systems. Big-Little Net: An Efficient Multi-Scale Feature Rep …

BiasedWalk: Biased Sampling for Representation Learning on Graphs


Title	BiasedWalk: Biased Sampling for Representation Learning on Graphs
Authors	Duong Nguyen, Fragkiskos D. Malliaros
Abstract	Network embedding algorithms are able to learn latent feature representations of nodes, transforming networks into lower dimensional vector representations. Typical key applications, which have effectively been addressed using network embeddings, include link prediction, multilabel classification and community detection. In this paper, we propose BiasedWalk, a scalable, unsupervised feature learning algorithm that is based on biased random walks to sample context information about each node in the network. Our random-walk based sampling can behave as Breath-First-Search (BFS) and Depth-First-Search (DFS) samplings with the goal to capture homophily and role equivalence between the nodes in the network. We have performed a detailed experimental evaluation comparing the performance of the proposed algorithm against various baseline methods, on several datasets and learning tasks. The experiment results show that the proposed method outperforms the baseline ones in most of the tasks and datasets.
Tasks	Community Detection, Link Prediction, Network Embedding, Node Classification, Representation Learning
Published	2018-09-07
URL	http://arxiv.org/abs/1809.02482v1
PDF	http://arxiv.org/pdf/1809.02482v1.pdf
PWC	https://paperswithcode.com/paper/biasedwalk-biased-sampling-for-representation
Repo	https://github.com/duong18/BiasedWalk
Framework	none

Stacked Cross Attention for Image-Text Matching


Title	Stacked Cross Attention for Image-Text Matching
Authors	Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, Xiaodong He
Abstract	In this paper, we study the problem of image-text matching. Inferring the latent semantic alignment between objects or other salient stuff (e.g. snow, sky, lawn) and the corresponding words in sentences allows to capture fine-grained interplay between vision and language, and makes image-text matching more interpretable. Prior work either simply aggregates the similarity of all possible pairs of regions and words without attending differentially to more and less important words or regions, or uses a multi-step attentional process to capture limited number of semantic alignments which is less interpretable. In this paper, we present Stacked Cross Attention to discover the full latent alignments using both image regions and words in a sentence as context and infer image-text similarity. Our approach achieves the state-of-the-art results on the MS-COCO and Flickr30K datasets. On Flickr30K, our approach outperforms the current best methods by 22.1% relatively in text retrieval from image query, and 18.2% relatively in image retrieval with text query (based on Recall@1). On MS-COCO, our approach improves sentence retrieval by 17.8% relatively and image retrieval by 16.6% relatively (based on Recall@1 using the 5K test set). Code has been made available at: https://github.com/kuanghuei/SCAN.
Tasks	Image Retrieval, Text Matching
Published	2018-03-21
URL	http://arxiv.org/abs/1803.08024v2
PDF	http://arxiv.org/pdf/1803.08024v2.pdf
PWC	https://paperswithcode.com/paper/stacked-cross-attention-for-image-text
Repo	https://github.com/kuanghuei/SCAN
Framework	pytorch

Music Mood Detection Based On Audio And Lyrics With Deep Neural Net


Title	Music Mood Detection Based On Audio And Lyrics With Deep Neural Net
Authors	Rémi Delbouys, Romain Hennequin, Francesco Piccoli, Jimena Royo-Letelier, Manuel Moussallam
Abstract	We consider the task of multimodal music mood prediction based on the audio signal and the lyrics of a track. We reproduce the implementation of traditional feature engineering based approaches and propose a new model based on deep learning. We compare the performance of both approaches on a database containing 18,000 tracks with associated valence and arousal values and show that our approach outperforms classical models on the arousal detection task, and that both approaches perform equally on the valence prediction task. We also compare the a posteriori fusion with fusion of modalities optimized simultaneously with each unimodal model, and observe a significant improvement of valence prediction. We release part of our database for comparison purposes.
Tasks	Multimodal Emotion Recognition, Music Emotion Recognition
Published	2018-09-19
URL	http://arxiv.org/abs/1809.07276v1
PDF	http://arxiv.org/pdf/1809.07276v1.pdf
PWC	https://paperswithcode.com/paper/music-mood-detection-based-on-audio-and
Repo	https://github.com/Dohppak/Music-Emotion-Recognition-Classification
Framework	pytorch

Response Ranking with Deep Matching Networks and External Knowledge in Information-seeking Conversation Systems


Title	Response Ranking with Deep Matching Networks and External Knowledge in Information-seeking Conversation Systems
Authors	Liu Yang, Minghui Qiu, Chen Qu, Jiafeng Guo, Yongfeng Zhang, W. Bruce Croft, Jun Huang, Haiqing Chen
Abstract	Intelligent personal assistant systems with either text-based or voice-based conversational interfaces are becoming increasingly popular around the world. Retrieval-based conversation models have the advantages of returning fluent and informative responses. Most existing studies in this area are on open domain “chit-chat” conversations or task / transaction oriented conversations. More research is needed for information-seeking conversations. There is also a lack of modeling external knowledge beyond the dialog utterances among current conversational models. In this paper, we propose a learning framework on the top of deep neural matching networks that leverages external knowledge for response ranking in information-seeking conversation systems. We incorporate external knowledge into deep neural models with pseudo-relevance feedback and QA correspondence knowledge distillation. Extensive experiments with three information-seeking conversation data sets including both open benchmarks and commercial data show that, our methods outperform various baseline methods including several deep text matching models and the state-of-the-art method on response selection in multi-turn conversations. We also perform analysis over different response types, model variations and ranking examples. Our models and research findings provide new insights on how to utilize external knowledge with deep neural models for response selection and have implications for the design of the next generation of information-seeking conversation systems.
Tasks	Text Matching
Published	2018-05-01
URL	http://arxiv.org/abs/1805.00188v3
PDF	http://arxiv.org/pdf/1805.00188v3.pdf
PWC	https://paperswithcode.com/paper/response-ranking-with-deep-matching-networks
Repo	https://github.com/yangliuy/NeuralResponseRanking
Framework	tf

Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition


Title	Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition
Authors	Chun-Fu Chen, Quanfu Fan, Neil Mallinar, Tom Sercu, Rogerio Feris
Abstract	In this paper, we propose a novel Convolutional Neural Network (CNN) architecture for learning multi-scale feature representations with good tradeoffs between speed and accuracy. This is achieved by using a multi-branch network, which has different computational complexity at different branches. Through frequent merging of features from branches at distinct scales, our model obtains multi-scale features while using less computation. The proposed approach demonstrates improvement of model efficiency and performance on both object recognition and speech recognition tasks,using popular architectures including ResNet and ResNeXt. For object recognition, our approach reduces computation by 33% on object recognition while improving accuracy with 0.9%. Furthermore, our model surpasses state-of-the-art CNN acceleration approaches by a large margin in accuracy and FLOPs reduction. On the task of speech recognition, our proposed multi-scale CNNs save 30% FLOPs with slightly better word error rates, showing good generalization across domains. The codes are available at https://github.com/IBM/BigLittleNet
Tasks	Object Recognition, Speech Recognition
Published	2018-07-10
URL	https://arxiv.org/abs/1807.03848v3
PDF	https://arxiv.org/pdf/1807.03848v3.pdf
PWC	https://paperswithcode.com/paper/big-little-net-an-efficient-multi-scale
Repo	https://github.com/k0pch4/big-little-net
Framework	pytorch

NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph Embedding


Title	NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph Embedding
Authors	Yongqi Zhang, Quanming Yao, Yingxia Shao, Lei Chen
Abstract	Knowledge Graph (KG) embedding is a fundamental problem in data mining research with many real-world applications. It aims to encode the entities and relations in the graph into low dimensional vector space, which can be used for subsequent algorithms. Negative sampling, which samples negative triplets from non-observed ones in the training data, is an important step in KG embedding. Recently, generative adversarial network (GAN), has been introduced in negative sampling. By sampling negative triplets with large scores, these methods avoid the problem of vanishing gradient and thus obtain better performance. However, using GAN makes the original model more complex and hard to train, where reinforcement learning must be used. In this paper, motivated by the observation that negative triplets with large scores are important but rare, we propose to directly keep track of them with the cache. However, how to sample from and update the cache are two important questions. We carefully design the solutions, which are not only efficient but also achieve a good balance between exploration and exploitation. In this way, our method acts as a “distilled” version of previous GA-based methods, which does not waste training time on additional parameters to fit the full distribution of negative triplets. The extensive experiments show that our method can gain significant improvement in various KG embedding models, and outperform the state-of-the-art negative sampling methods based on GAN.
Tasks	Graph Embedding, Knowledge Graph Embedding
Published	2018-12-16
URL	http://arxiv.org/abs/1812.06410v2
PDF	http://arxiv.org/pdf/1812.06410v2.pdf
PWC	https://paperswithcode.com/paper/nscaching-simple-and-efficient-negative
Repo	https://github.com/yzhangee/NSCaching
Framework	pytorch

Learning End-to-end Autonomous Driving using Guided Auxiliary Supervision


Title	Learning End-to-end Autonomous Driving using Guided Auxiliary Supervision
Authors	Ashish Mehta, Adithya Subramanian, Anbumani Subramanian
Abstract	Learning to drive faithfully in highly stochastic urban settings remains an open problem. To that end, we propose a Multi-task Learning from Demonstration (MT-LfD) framework which uses supervised auxiliary task prediction to guide the main task of predicting the driving commands. Our framework involves an end-to-end trainable network for imitating the expert demonstrator’s driving commands. The network intermediately predicts visual affordances and action primitives through direct supervision which provide the aforementioned auxiliary supervised guidance. We demonstrate that such joint learning and supervised guidance facilitates hierarchical task decomposition, assisting the agent to learn faster, achieve better driving performance and increases transparency of the otherwise black-box end-to-end network. We run our experiments to validate the MT-LfD framework in CARLA, an open-source urban driving simulator. We introduce multiple non-player agents in CARLA and induce temporal noise in them for realistic stochasticity.
Tasks	Autonomous Driving, Multi-Task Learning
Published	2018-08-30
URL	http://arxiv.org/abs/1808.10393v1
PDF	http://arxiv.org/pdf/1808.10393v1.pdf
PWC	https://paperswithcode.com/paper/learning-end-to-end-autonomous-driving-using
Repo	https://github.com/AshishMehtaIO/MTLfD-CARLA
Framework	tf

A Brief Review of Real-World Color Image Denoising


Title	A Brief Review of Real-World Color Image Denoising
Authors	Zhaoming Kong, Xiaowei Yang
Abstract	Filtering real-world color images is challenging due to the complexity of noise that can not be formulated as a certain distribution. However, the rapid development of camera lens pos- es greater demands on image denoising in terms of both efficiency and effectiveness. Currently, the most widely accepted framework employs the combination of transform domain techniques and nonlocal similarity characteristics of natural images. Based on this framework, many competitive methods model the correlation of R, G, B channels with pre-defined or adaptively learned transforms. In this chapter, a brief review of related methods and publicly available datasets is presented, moreover, a new dataset that includes more natural outdoor scenes is introduced. Extensive experiments are performed and discussion on visual effect enhancement is included.
Tasks	Denoising, Image Denoising
Published	2018-09-10
URL	http://arxiv.org/abs/1809.03298v1
PDF	http://arxiv.org/pdf/1809.03298v1.pdf
PWC	https://paperswithcode.com/paper/a-brief-review-of-real-world-color-image
Repo	https://github.com/ZhaomingKong/Pure_Image
Framework	none

Deep Priority Hashing


Title	Deep Priority Hashing
Authors	Zhangjie Cao, Ziping Sun, Mingsheng Long, Jianmin Wang, Philip S. Yu
Abstract	Deep hashing enables image retrieval by end-to-end learning of deep representations and hash codes from training data with pairwise similarity information. Subject to the distribution skewness underlying the similarity information, most existing deep hashing methods may underperform for imbalanced data due to misspecified loss functions. This paper presents Deep Priority Hashing (DPH), an end-to-end architecture that generates compact and balanced hash codes in a Bayesian learning framework. The main idea is to reshape the standard cross-entropy loss for similarity-preserving learning such that it down-weighs the loss associated to highly-confident pairs. This idea leads to a novel priority cross-entropy loss, which prioritizes the training on uncertain pairs over confident pairs. Also, we propose another priority quantization loss, which prioritizes hard-to-quantize examples for generation of nearly lossless hash codes. Extensive experiments demonstrate that DPH can generate high-quality hash codes and yield state-of-the-art image retrieval results on three datasets, ImageNet, NUS-WIDE, and MS-COCO.
Tasks	Image Retrieval, Quantization
Published	2018-09-04
URL	http://arxiv.org/abs/1809.01238v1
PDF	http://arxiv.org/pdf/1809.01238v1.pdf
PWC	https://paperswithcode.com/paper/deep-priority-hashing
Repo	https://github.com/thuml/DPH
Framework	none

Fully Convolutional Pixel Adaptive Image Denoiser


Title	Fully Convolutional Pixel Adaptive Image Denoiser
Authors	Sungmin Cha, Taesup Moon
Abstract	We propose a new image denoising algorithm, dubbed as Fully Convolutional Adaptive Image DEnoiser (FC-AIDE), that can learn from an offline supervised training set with a fully convolutional neural network as well as adaptively fine-tune the supervised model for each given noisy image. We significantly extend the framework of the recently proposed Neural AIDE, which formulates the denoiser to be context-based pixelwise mappings and utilizes the unbiased estimator of MSE for such denoisers. The two main contributions we make are; 1) implementing a novel fully convolutional architecture that boosts the base supervised model, and 2) introducing regularization methods for the adaptive fine-tuning such that a stronger and more robust adaptivity can be attained. As a result, FC-AIDE is shown to possess many desirable features; it outperforms the recent CNN-based state-of-the-art denoisers on all of the benchmark datasets we tested, and gets particularly strong for various challenging scenarios, e.g., with mismatched image/noise characteristics or with scarce supervised training data. The source code of our algorithm is available at https://github.com/csm9493/FC-AIDE-Keras.
Tasks	Denoising, Image Denoising
Published	2018-07-19
URL	https://arxiv.org/abs/1807.07569v4
PDF	https://arxiv.org/pdf/1807.07569v4.pdf
PWC	https://paperswithcode.com/paper/fully-convolutional-pixel-adaptive-image
Repo	https://github.com/csm9493/FC-AIDE
Framework	tf

Super-Resolution via Image-Adapted Denoising CNNs: Incorporating External and Internal Learning


Title	Super-Resolution via Image-Adapted Denoising CNNs: Incorporating External and Internal Learning
Authors	Tom Tirer, Raja Giryes
Abstract	While deep neural networks exhibit state-of-the-art results in the task of image super-resolution (SR) with a fixed known acquisition process (e.g., a bicubic downscaling kernel), they experience a huge performance loss when the real observation model mismatches the one used in training. Recently, two different techniques suggested to mitigate this deficiency, i.e., enjoy the advantages of deep learning without being restricted by the training phase. The first one follows the plug-and-play (P&P) approach that solves general inverse problems (e.g., SR) by using Gaussian denoisers for handling the prior term in model-based optimization schemes. The second builds on internal recurrence of information inside a single image, and trains a super-resolver network at test time on examples synthesized from the low-resolution image. Our work incorporates these two independent strategies, enjoying the impressive generalization capabilities of deep learning, captured by the first, and further improving it through internal learning at test time. First, we apply a recent P&P strategy to SR. Then, we show how it may become image-adaptive in test time. This technique outperforms the above two strategies on popular datasets and gives better results than other state-of-the-art methods in practical cases where the observation model is inexact or unknown in advance.
Tasks	Denoising, Image Super-Resolution, Super-Resolution
Published	2018-11-30
URL	https://arxiv.org/abs/1811.12866v3
PDF	https://arxiv.org/pdf/1811.12866v3.pdf
PWC	https://paperswithcode.com/paper/super-resolution-based-on-image-adapted-cnn
Repo	https://github.com/tomtirer/IDBP-CNN-IA
Framework	none

Improved Deep Spectral Convolution Network For Hyperspectral Unmixing With Multinomial Mixture Kernel and Endmember Uncertainty


Title	Improved Deep Spectral Convolution Network For Hyperspectral Unmixing With Multinomial Mixture Kernel and Endmember Uncertainty
Authors	Savas Ozkan, Gozde Bozdagi Akar
Abstract	In this study, we propose a novel framework for hyperspectral unmixing by using an improved deep spectral convolution network (DSCN++) combined with endmember uncertainty. DSCN++ is used to compute high-level representations which are further modeled with Multinomial Mixture Model to estimate abundance maps. In the reconstruction step, a new trainable uncertainty term based on a nonlinear neural network model is introduced to provide robustness to endmember uncertainty. For the optimization of the coefficients of the multinomial model and the uncertainty term, Wasserstein Generative Adversarial Network (WGAN) is exploited to improve stability and to capture uncertainty. Experiments are performed on both real and synthetic datasets. The results validate that the proposed method obtains state-of-the-art hyperspectral unmixing performance particularly on the real datasets compared to the baseline techniques.
Tasks	Hyperspectral Unmixing
Published	2018-08-03
URL	https://arxiv.org/abs/1808.01104v4
PDF	https://arxiv.org/pdf/1808.01104v4.pdf
PWC	https://paperswithcode.com/paper/improved-deep-spectral-convolution-network
Repo	https://github.com/savasozkan/dscn
Framework	tf

Online Multiclass Boosting with Bandit Feedback


Title	Online Multiclass Boosting with Bandit Feedback
Authors	Daniel T. Zhang, Young Hun Jung, Ambuj Tewari
Abstract	We present online boosting algorithms for multiclass classification with bandit feedback, where the learner only receives feedback about the correctness of its prediction. We propose an unbiased estimate of the loss using a randomized prediction, allowing the model to update its weak learners with limited information. Using the unbiased estimate, we extend two full information boosting algorithms (Jung et al., 2017) to the bandit setting. We prove that the asymptotic error bounds of the bandit algorithms exactly match their full information counterparts. The cost of restricted feedback is reflected in the larger sample complexity. Experimental results also support our theoretical findings, and performance of the proposed models is comparable to that of an existing bandit boosting algorithm, which is limited to use binary weak learners.
Tasks
Published	2018-10-11
URL	http://arxiv.org/abs/1810.05290v2
PDF	http://arxiv.org/pdf/1810.05290v2.pdf
PWC	https://paperswithcode.com/paper/online-multiclass-boosting-with-bandit
Repo	https://github.com/pi224/banditboosting
Framework	none

Label-Noise Robust Generative Adversarial Networks


Title	Label-Noise Robust Generative Adversarial Networks
Authors	Takuhiro Kaneko, Yoshitaka Ushiku, Tatsuya Harada
Abstract	Generative adversarial networks (GANs) are a framework that learns a generative distribution through adversarial training. Recently, their class-conditional extensions (e.g., conditional GAN (cGAN) and auxiliary classifier GAN (AC-GAN)) have attracted much attention owing to their ability to learn the disentangled representations and to improve the training stability. However, their training requires the availability of large-scale accurate class-labeled data, which are often laborious or impractical to collect in a real-world scenario. To remedy this, we propose a novel family of GANs called label-noise robust GANs (rGANs), which, by incorporating a noise transition model, can learn a clean label conditional generative distribution even when training labels are noisy. In particular, we propose two variants: rAC-GAN, which is a bridging model between AC-GAN and the label-noise robust classification model, and rcGAN, which is an extension of cGAN and solves this problem with no reliance on any classifier. In addition to providing the theoretical background, we demonstrate the effectiveness of our models through extensive experiments using diverse GAN configurations, various noise settings, and multiple evaluation metrics (in which we tested 402 conditions in total). Our code is available at https://github.com/takuhirok/rGAN/.
Tasks
Published	2018-11-27
URL	https://arxiv.org/abs/1811.11165v2
PDF	https://arxiv.org/pdf/1811.11165v2.pdf
PWC	https://paperswithcode.com/paper/label-noise-robust-generative-adversarial
Repo	https://github.com/takuhirok/NR-GAN
Framework	pytorch

Deep Spectral Convolution Network for HyperSpectral Unmixing


Title	Deep Spectral Convolution Network for HyperSpectral Unmixing
Authors	Savas Ozkan, Gozde Bozdagi Akar
Abstract	In this paper, we propose a novel hyperspectral unmixing technique based on deep spectral convolution networks (DSCN). Particularly, three important contributions are presented throughout this paper. First, fully-connected linear operation is replaced with spectral convolutions to extract local spectral characteristics from hyperspectral signatures with a deeper network architecture. Second, instead of batch normalization, we propose a spectral normalization layer which improves the selectivity of filters by normalizing their spectral responses. Third, we introduce two fusion configurations that produce ideal abundance maps by using the abstract representations computed from previous layers. In experiments, we use two real datasets to evaluate the performance of our method with other baseline techniques. The experimental results validate that the proposed method outperforms baselines based on Root Mean Square Error (RMSE).
Tasks	Hyperspectral Unmixing
Published	2018-06-22
URL	http://arxiv.org/abs/1806.08562v1
PDF	http://arxiv.org/pdf/1806.08562v1.pdf
PWC	https://paperswithcode.com/paper/deep-spectral-convolution-network-for
Repo	https://github.com/savasozkan/dscn
Framework	tf