Paper Group AWR 49
Learning Spatial Fusion for Single-Shot Object Detection. On the Importance of Word Boundaries in Character-level Neural Machine Translation. Putting An End to End-to-End: Gradient-Isolated Learning of Representations. Discriminative Neural Clustering for Speaker Diarisation. Learning Raw Image Denoising with Bayer Pattern Unification and Bayer Pre …
Learning Spatial Fusion for Single-Shot Object Detection
Title | Learning Spatial Fusion for Single-Shot Object Detection |
Authors | Songtao Liu, Di Huang, Yunhong Wang |
Abstract | Pyramidal feature representation is the common practice to address the challenge of scale variation in object detection. However, the inconsistency across different feature scales is a primary limitation for the single-shot detectors based on feature pyramid. In this work, we propose a novel and data driven strategy for pyramidal feature fusion, referred to as adaptively spatial feature fusion (ASFF). It learns the way to spatially filter conflictive information to suppress the inconsistency, thus improving the scale-invariance of features, and introduces nearly free inference overhead. With the ASFF strategy and a solid baseline of YOLOv3, we achieve the best speed-accuracy trade-off on the MS COCO dataset, reporting 38.1% AP at 60 FPS, 42.4% AP at 45 FPS and 43.9% AP at 29 FPS. The code is available at https://github.com/ruinmessi/ASFF |
Tasks | Object Detection |
Published | 2019-11-21 |
URL | https://arxiv.org/abs/1911.09516v2 |
https://arxiv.org/pdf/1911.09516v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-spatial-fusion-for-single-shot |
Repo | https://github.com/ruinmessi/ASFF |
Framework | pytorch |
On the Importance of Word Boundaries in Character-level Neural Machine Translation
Title | On the Importance of Word Boundaries in Character-level Neural Machine Translation |
Authors | Duygu Ataman, Orhan Firat, Mattia A. Di Gangi, Marcello Federico, Alexandra Birch |
Abstract | Neural Machine Translation (NMT) models generally perform translation using a fixed-size lexical vocabulary, which is an important bottleneck on their generalization capability and overall translation quality. The standard approach to overcome this limitation is to segment words into subword units, typically using some external tools with arbitrary heuristics, resulting in vocabulary units not optimized for the translation task. Recent studies have shown that the same approach can be extended to perform NMT directly at the level of characters, which can deliver translation accuracy on-par with subword-based models, on the other hand, this requires relatively deeper networks. In this paper, we propose a more computationally-efficient solution for character-level NMT which implements a hierarchical decoding architecture where translations are subsequently generated at the level of words and characters. We evaluate different methods for open-vocabulary NMT in the machine translation task from English into five languages with distinct morphological typology, and show that the hierarchical decoding model can reach higher translation accuracy than the subword-level NMT model using significantly fewer parameters, while demonstrating better capacity in learning longer-distance contextual and grammatical dependencies than the standard character-level NMT model. |
Tasks | Machine Translation |
Published | 2019-10-15 |
URL | https://arxiv.org/abs/1910.06753v2 |
https://arxiv.org/pdf/1910.06753v2.pdf | |
PWC | https://paperswithcode.com/paper/on-the-importance-of-word-boundaries-in |
Repo | https://github.com/d-ataman/Char-NMT |
Framework | pytorch |
Putting An End to End-to-End: Gradient-Isolated Learning of Representations
Title | Putting An End to End-to-End: Gradient-Isolated Learning of Representations |
Authors | Sindy Löwe, Peter O’Connor, Bastiaan S. Veeling |
Abstract | We propose a novel deep learning method for local self-supervised representation learning that does not require labels nor end-to-end backpropagation but exploits the natural order in data instead. Inspired by the observation that biological neural networks appear to learn without backpropagating a global error signal, we split a deep neural network into a stack of gradient-isolated modules. Each module is trained to maximally preserve the information of its inputs using the InfoNCE bound from Oord et al. [2018]. Despite this greedy training, we demonstrate that each module improves upon the output of its predecessor, and that the representations created by the top module yield highly competitive results on downstream classification tasks in the audio and visual domain. The proposal enables optimizing modules asynchronously, allowing large-scale distributed training of very deep neural networks on unlabelled datasets. |
Tasks | Representation Learning |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11786v3 |
https://arxiv.org/pdf/1905.11786v3.pdf | |
PWC | https://paperswithcode.com/paper/greedy-infomax-for-biologically-plausible |
Repo | https://github.com/loeweX/Greedy_InfoMax |
Framework | pytorch |
Discriminative Neural Clustering for Speaker Diarisation
Title | Discriminative Neural Clustering for Speaker Diarisation |
Authors | Qiujia Li, Florian L. Kreyssig, Chao Zhang, Philip C. Woodland |
Abstract | This paper proposes a novel method for supervised data clustering. The clustering procedure is modelled by a discriminative sequence-to-sequence neural network that learns from examples. The effectiveness of the Transformer-based Discriminative Neural Clustering (DNC) model is validated on a speaker diarisation task using the challenging AMI data set, where audio segments need to be clustered into an unknown number of speakers. The AMI corpus contains only 147 meetings as training examples for the DNC model, which is very limited for training an encoder-decoder neural network. Data scarcity is mitigated through three data augmentation schemes proposed in this paper, including Diaconis Augmentation, a novel technique proposed for discriminative embeddings trained using cosine similarities. Comparing between DNC and the commonly used spectral clustering algorithm for speaker diarisation shows that the DNC approach outperforms its unsupervised counterpart by 29.4% relative. Furthermore, DNC requires no explicit definition of a similarity measure between samples, which is a significant advantage considering that such a measure might be difficult to specify. |
Tasks | Data Augmentation |
Published | 2019-10-22 |
URL | https://arxiv.org/abs/1910.09703v1 |
https://arxiv.org/pdf/1910.09703v1.pdf | |
PWC | https://paperswithcode.com/paper/discriminative-neural-clustering-for-speaker |
Repo | https://github.com/FlorianKrey/DNC |
Framework | pytorch |
Learning Raw Image Denoising with Bayer Pattern Unification and Bayer Preserving Augmentation
Title | Learning Raw Image Denoising with Bayer Pattern Unification and Bayer Preserving Augmentation |
Authors | Jiaming Liu, Chi-Hao Wu, Yuzhi Wang, Qin Xu, Yuqian Zhou, Haibin Huang, Chuan Wang, Shaofan Cai, Yifan Ding, Haoqiang Fan, Jue Wang |
Abstract | In this paper, we present new data pre-processing and augmentation techniques for DNN-based raw image denoising. Compared with traditional RGB image denoising, performing this task on direct camera sensor readings presents new challenges such as how to effectively handle various Bayer patterns from different data sources, and subsequently how to perform valid data augmentation with raw images. To address the first problem, we propose a Bayer pattern unification (BayerUnify) method to unify different Bayer patterns. This allows us to fully utilize a heterogeneous dataset to train a single denoising model instead of training one model for each pattern. Furthermore, while it is essential to augment the dataset to improve model generalization and performance, we discovered that it is error-prone to modify raw images by adapting augmentation methods designed for RGB images. Towards this end, we present a Bayer preserving augmentation (BayerAug) method as an effective approach for raw image augmentation. Combining these data processing technqiues with a modified U-Net, our method achieves a PSNR of 52.11 and a SSIM of 0.9969 in NTIRE 2019 Real Image Denoising Challenge, demonstrating the state-of-the-art performance. Our code is available at https://github.com/Jiaming-Liu/BayerUnifyAug. |
Tasks | Data Augmentation, Denoising, Image Augmentation, Image Denoising |
Published | 2019-04-29 |
URL | https://arxiv.org/abs/1904.12945v2 |
https://arxiv.org/pdf/1904.12945v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-raw-image-denoising-with-bayer |
Repo | https://github.com/Jiaming-Liu/BayerUnifyAug |
Framework | none |
Generating Reliable Friends via Adversarial Training to Improve Social Recommendation
Title | Generating Reliable Friends via Adversarial Training to Improve Social Recommendation |
Authors | Junliang Yu, Min Gao, Hongzhi Yin, Jundong Li, Chongming Gao, Qinyong Wang |
Abstract | Most of the recent studies of social recommendation assume that people share similar preferences with their friends and the online social relations are helpful in improving traditional recommender systems. However, this assumption is often untenable as the online social networks are quite sparse and a majority of users only have a small number of friends. Besides, explicit friends may not share similar interests because of the randomness in the process of building social networks. Therefore, discovering a number of reliable friends for each user plays an important role in advancing social recommendation. Unlike other studies which focus on extracting valuable explicit social links, our work pays attention to identifying reliable friends in both the observed and unobserved social networks. Concretely, in this paper, we propose an end-to-end social recommendation framework based on Generative Adversarial Nets (GAN). The framework is composed of two blocks: a generator that is used to produce friends that can possibly enhance the social recommendation model, and a discriminator that is responsible for assessing these generated friends and ranking the items according to both the current user and her friends’ preferences. With the competition between the generator and the discriminator, our framework can dynamically and adaptively generate reliable friends who can perfectly predict the current user’ preference at a specific time. As a result, the sparsity and unreliability problems of explicit social relations can be mitigated and the social recommendation performance is significantly improved. Experimental studies on real-world datasets demonstrate the superiority of our framework and verify the positive effects of the generated reliable friends. |
Tasks | Recommendation Systems |
Published | 2019-09-08 |
URL | https://arxiv.org/abs/1909.03529v1 |
https://arxiv.org/pdf/1909.03529v1.pdf | |
PWC | https://paperswithcode.com/paper/generating-reliable-friends-via-adversarial |
Repo | https://github.com/Coder-Yu/RecQ |
Framework | tf |
Session-based Social Recommendation via Dynamic Graph Attention Networks
Title | Session-based Social Recommendation via Dynamic Graph Attention Networks |
Authors | Weiping Song, Zhiping Xiao, Yifan Wang, Laurent Charlin, Ming Zhang, Jian Tang |
Abstract | Online communities such as Facebook and Twitter are enormously popular and have become an essential part of the daily life of many of their users. Through these platforms, users can discover and create information that others will then consume. In that context, recommending relevant information to users becomes critical for viability. However, recommendation in online communities is a challenging problem: 1) users’ interests are dynamic, and 2) users are influenced by their friends. Moreover, the influencers may be context-dependent. That is, different friends may be relied upon for different topics. Modeling both signals is therefore essential for recommendations. We propose a recommender system for online communities based on a dynamic-graph-attention neural network. We model dynamic user behaviors with a recurrent neural network, and context-dependent social influence with a graph-attention neural network, which dynamically infers the influencers based on users’ current interests. The whole model can be efficiently fit on large-scale data. Experimental results on several real-world data sets demonstrate the effectiveness of our proposed approach over several competitive baselines including state-of-the-art models. |
Tasks | Recommendation Systems |
Published | 2019-02-25 |
URL | http://arxiv.org/abs/1902.09362v2 |
http://arxiv.org/pdf/1902.09362v2.pdf | |
PWC | https://paperswithcode.com/paper/session-based-social-recommendation-via |
Repo | https://github.com/Pblamichha42/DynamicGAT |
Framework | tf |
KGAT: Knowledge Graph Attention Network for Recommendation
Title | KGAT: Knowledge Graph Attention Network for Recommendation |
Authors | Xiang Wang, Xiangnan He, Yixin Cao, Meng Liu, Tat-Seng Chua |
Abstract | To provide more accurate, diverse, and explainable recommendation, it is compulsory to go beyond modeling user-item interactions and take side information into account. Traditional methods like factorization machine (FM) cast it as a supervised learning problem, which assumes each interaction as an independent instance with side information encoded. Due to the overlook of the relations among instances or items (e.g., the director of a movie is also an actor of another movie), these methods are insufficient to distill the collaborative signal from the collective behaviors of users. In this work, we investigate the utility of knowledge graph (KG), which breaks down the independent interaction assumption by linking items with their attributes. We argue that in such a hybrid structure of KG and user-item graph, high-order relations — which connect two items with one or multiple linked attributes — are an essential factor for successful recommendation. We propose a new method named Knowledge Graph Attention Network (KGAT) which explicitly models the high-order connectivities in KG in an end-to-end fashion. It recursively propagates the embeddings from a node’s neighbors (which can be users, items, or attributes) to refine the node’s embedding, and employs an attention mechanism to discriminate the importance of the neighbors. Our KGAT is conceptually advantageous to existing KG-based recommendation methods, which either exploit high-order relations by extracting paths or implicitly modeling them with regularization. Empirical results on three public benchmarks show that KGAT significantly outperforms state-of-the-art methods like Neural FM and RippleNet. Further studies verify the efficacy of embedding propagation for high-order relation modeling and the interpretability benefits brought by the attention mechanism. |
Tasks | Knowledge Graphs, Recommendation Systems |
Published | 2019-05-20 |
URL | https://arxiv.org/abs/1905.07854v2 |
https://arxiv.org/pdf/1905.07854v2.pdf | |
PWC | https://paperswithcode.com/paper/kgat-knowledge-graph-attention-network-for |
Repo | https://github.com/LunaBlack/KGAT-pytorch |
Framework | pytorch |
Neural Graph Collaborative Filtering
Title | Neural Graph Collaborative Filtering |
Authors | Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, Tat-Seng Chua |
Abstract | Learning vector representations (aka. embeddings) of users and items lies at the core of modern recommender systems. Ranging from early matrix factorization to recently emerged deep learning based methods, existing efforts typically obtain a user’s (or an item’s) embedding by mapping from pre-existing features that describe the user (or the item), such as ID and attributes. We argue that an inherent drawback of such methods is that, the collaborative signal, which is latent in user-item interactions, is not encoded in the embedding process. As such, the resultant embeddings may not be sufficient to capture the collaborative filtering effect. In this work, we propose to integrate the user-item interactions — more specifically the bipartite graph structure — into the embedding process. We develop a new recommendation framework Neural Graph Collaborative Filtering (NGCF), which exploits the user-item graph structure by propagating embeddings on it. This leads to the expressive modeling of high-order connectivity in user-item graph, effectively injecting the collaborative signal into the embedding process in an explicit manner. We conduct extensive experiments on three public benchmarks, demonstrating significant improvements over several state-of-the-art models like HOP-Rec and Collaborative Memory Network. Further analysis verifies the importance of embedding propagation for learning better user and item representations, justifying the rationality and effectiveness of NGCF. Codes are available at https://github.com/xiangwang1223/neural_graph_collaborative_filtering. |
Tasks | Recommendation Systems |
Published | 2019-05-20 |
URL | https://arxiv.org/abs/1905.08108v1 |
https://arxiv.org/pdf/1905.08108v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-graph-collaborative-filtering |
Repo | https://github.com/huangtinglin/NGCF-PyTorch |
Framework | pytorch |
Genie: A Generator of Natural Language Semantic Parsers for Virtual Assistant Commands
Title | Genie: A Generator of Natural Language Semantic Parsers for Virtual Assistant Commands |
Authors | Giovanni Campagna, Silei Xu, Mehrad Moradshahi, Richard Socher, Monica S. Lam |
Abstract | To understand diverse natural language commands, virtual assistants today are trained with numerous labor-intensive, manually annotated sentences. This paper presents a methodology and the Genie toolkit that can handle new compound commands with significantly less manual effort. We advocate formalizing the capability of virtual assistants with a Virtual Assistant Programming Language (VAPL) and using a neural semantic parser to translate natural language into VAPL code. Genie needs only a small realistic set of input sentences for validating the neural model. Developers write templates to synthesize data; Genie uses crowdsourced paraphrases and data augmentation, along with the synthesized data, to train a semantic parser. We also propose design principles that make VAPL languages amenable to natural language translation. We apply these principles to revise ThingTalk, the language used by the Almond virtual assistant. We use Genie to build the first semantic parser that can support compound virtual assistants commands with unquoted free-form parameters. Genie achieves a 62% accuracy on realistic user inputs. We demonstrate Genie’s generality by showing a 19% and 31% improvement over the previous state of the art on a music skill, aggregate functions, and access control. |
Tasks | Data Augmentation |
Published | 2019-04-18 |
URL | http://arxiv.org/abs/1904.09020v1 |
http://arxiv.org/pdf/1904.09020v1.pdf | |
PWC | https://paperswithcode.com/paper/genie-a-generator-of-natural-language |
Repo | https://github.com/stanford-oval/genie-toolkit |
Framework | none |
HBONet: Harmonious Bottleneck on Two Orthogonal Dimensions
Title | HBONet: Harmonious Bottleneck on Two Orthogonal Dimensions |
Authors | Duo Li, Aojun Zhou, Anbang Yao |
Abstract | MobileNets, a class of top-performing convolutional neural network architectures in terms of accuracy and efficiency trade-off, are increasingly used in many resourceaware vision applications. In this paper, we present Harmonious Bottleneck on two Orthogonal dimensions (HBO), a novel architecture unit, specially tailored to boost the accuracy of extremely lightweight MobileNets at the level of less than 40 MFLOPs. Unlike existing bottleneck designs that mainly focus on exploring the interdependencies among the channels of either groupwise or depthwise convolutional features, our HBO improves bottleneck representation while maintaining similar complexity via jointly encoding the feature interdependencies across both spatial and channel dimensions. It has two reciprocal components, namely spatial contraction-expansion and channel expansion-contraction, nested in a bilaterally symmetric structure. The combination of two interdependent transformations performing on orthogonal dimensions of feature maps enhances the representation and generalization ability of our proposed module, guaranteeing compelling performance with limited computational resource and power. By replacing the original bottlenecks in MobileNetV2 backbone with HBO modules, we construct HBONets which are evaluated on ImageNet classification, PASCAL VOC object detection and Market-1501 person re-identification. Extensive experiments show that with the severe constraint of computational budget our models outperform MobileNetV2 counterparts by remarkable margins of at most 6.6%, 6.3% and 5.0% on the above benchmarks respectively. Code and pretrained models are available at https://github.com/d-li14/HBONet. |
Tasks | Object Detection, Person Re-Identification |
Published | 2019-08-11 |
URL | https://arxiv.org/abs/1908.03888v1 |
https://arxiv.org/pdf/1908.03888v1.pdf | |
PWC | https://paperswithcode.com/paper/hbonet-harmonious-bottleneck-on-two |
Repo | https://github.com/d-li14/HBONet |
Framework | pytorch |
Discriminative Joint Probability Maximum Mean Discrepancy (DJP-MMD) for Domain Adaptation
Title | Discriminative Joint Probability Maximum Mean Discrepancy (DJP-MMD) for Domain Adaptation |
Authors | Wen Zhang, Dongrui Wu |
Abstract | Maximum mean discrepancy (MMD) has been widely adopted in domain adaptation to measure the discrepancy between the source and target domain distributions. Many existing domain adaptation approaches are based on the joint MMD, which is computed as the (weighted) sum of the marginal distribution discrepancy and the conditional distribution discrepancy; however, a more natural metric may be their joint probability distribution discrepancy. Additionally, most metrics only aim to increase the transferability between domains, but ignores the discriminability between different classes, which may result in insufficient classification performance. To address these issues, discriminative joint probability MMD (DJP-MMD) is proposed in this paper to replace the frequently-used joint MMD in domain adaptation. It has two desirable properties: 1) it provides a new theoretical basis for computing the distribution discrepancy, which is simpler and more accurate; 2) it increases the transferability and discriminability simultaneously. We validate its performance by embedding it into a joint probability domain adaptation framework. Experiments on six image classification datasets demonstrated that the proposed DJP-MMD can outperform traditional MMDs. |
Tasks | Domain Adaptation, Image Classification, Transfer Learning |
Published | 2019-12-01 |
URL | https://arxiv.org/abs/1912.00320v3 |
https://arxiv.org/pdf/1912.00320v3.pdf | |
PWC | https://paperswithcode.com/paper/transferability-versus-discriminability-joint |
Repo | https://github.com/chamwen/JPDA |
Framework | none |
LaSO: Label-Set Operations networks for multi-label few-shot learning
Title | LaSO: Label-Set Operations networks for multi-label few-shot learning |
Authors | Amit Alfassy, Leonid Karlinsky, Amit Aides, Joseph Shtok, Sivan Harary, Rogerio Feris, Raja Giryes, Alex M. Bronstein |
Abstract | Example synthesis is one of the leading methods to tackle the problem of few-shot learning, where only a small number of samples per class are available. However, current synthesis approaches only address the scenario of a single category label per image. In this work, we propose a novel technique for synthesizing samples with multiple labels for the (yet unhandled) multi-label few-shot classification scenario. We propose to combine pairs of given examples in feature space, so that the resulting synthesized feature vectors will correspond to examples whose label sets are obtained through certain set operations on the label sets of the corresponding input pairs. Thus, our method is capable of producing a sample containing the intersection, union or set-difference of labels present in two input samples. As we show, these set operations generalize to labels unseen during training. This enables performing augmentation on examples of novel categories, thus, facilitating multi-label few-shot classifier learning. We conduct numerous experiments showing promising results for the label-set manipulation capabilities of the proposed approach, both directly (using the classification and retrieval metrics), and in the context of performing data augmentation for multi-label few-shot learning. We propose a benchmark for this new and challenging task and show that our method compares favorably to all the common baselines. |
Tasks | Data Augmentation, Few-Shot Learning |
Published | 2019-02-26 |
URL | http://arxiv.org/abs/1902.09811v1 |
http://arxiv.org/pdf/1902.09811v1.pdf | |
PWC | https://paperswithcode.com/paper/laso-label-set-operations-networks-for-multi |
Repo | https://github.com/nganltp/admicro-LaSO |
Framework | pytorch |
Joint High Dynamic Range Imaging and Super-Resolution from a Single Image
Title | Joint High Dynamic Range Imaging and Super-Resolution from a Single Image |
Authors | Jae Woong Soh, Jae Sung Park, Nam Ik Cho |
Abstract | This paper presents a new framework for jointly enhancing the resolution and the dynamic range of an image, i.e., simultaneous super-resolution (SR) and high dynamic range imaging (HDRI), based on a convolutional neural network (CNN). From the common trends of both tasks, we train a CNN for the joint HDRI and SR by focusing on the reconstruction of high-frequency details. Specifically, the high-frequency component in our work is the reflectance component according to the Retinex-based image decomposition, and only the reflectance component is manipulated by the CNN while another component (illumination) is processed in a conventional way. In training the CNN, we devise an appropriate loss function that contributes to the naturalness quality of resulting images. Experiments show that our algorithm outperforms the cascade implementation of CNN-based SR and HDRI. |
Tasks | Super-Resolution |
Published | 2019-05-02 |
URL | https://arxiv.org/abs/1905.00933v1 |
https://arxiv.org/pdf/1905.00933v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-high-dynamic-range-imaging-and-super |
Repo | https://github.com/JWSoh/HDRI-SR |
Framework | tf |
Learning Loss for Active Learning
Title | Learning Loss for Active Learning |
Authors | Donggeun Yoo, In So Kweon |
Abstract | The performance of deep neural networks improves with more annotated data. The problem is that the budget for annotation is limited. One solution to this is active learning, where a model asks human to annotate data that it perceived as uncertain. A variety of recent methods have been proposed to apply active learning to deep networks but most of them are either designed specific for their target tasks or computationally inefficient for large networks. In this paper, we propose a novel active learning method that is simple but task-agnostic, and works efficiently with the deep networks. We attach a small parametric module, named “loss prediction module,” to a target network, and learn it to predict target losses of unlabeled inputs. Then, this module can suggest data that the target model is likely to produce a wrong prediction. This method is task-agnostic as networks are learned from a single loss regardless of target tasks. We rigorously validate our method through image classification, object detection, and human pose estimation, with the recent network architectures. The results demonstrate that our method consistently outperforms the previous methods over the tasks. |
Tasks | Active Learning, Image Classification, Object Detection, Pose Estimation |
Published | 2019-05-09 |
URL | https://arxiv.org/abs/1905.03677v1 |
https://arxiv.org/pdf/1905.03677v1.pdf | |
PWC | https://paperswithcode.com/paper/190503677 |
Repo | https://github.com/Mephisto405/Learning-Loss-for-Active-Learning |
Framework | pytorch |