Paper Group AWR 43
ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks. Deep Iterative and Adaptive Learning for Graph Neural Networks. Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness. MONet: Unsupervised Scene Decomposition and Representation. ChromaGAN: Adversarial Picture Colorization with Semantic Class Di …
ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks
Title | ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks |
Authors | Xiaohan Ding, Yuchen Guo, Guiguang Ding, Jungong Han |
Abstract | As designing appropriate Convolutional Neural Network (CNN) architecture in the context of a given application usually involves heavy human works or numerous GPU hours, the research community is soliciting the architecture-neutral CNN structures, which can be easily plugged into multiple mature architectures to improve the performance on our real-world applications. We propose Asymmetric Convolution Block (ACB), an architecture-neutral structure as a CNN building block, which uses 1D asymmetric convolutions to strengthen the square convolution kernels. For an off-the-shelf architecture, we replace the standard square-kernel convolutional layers with ACBs to construct an Asymmetric Convolutional Network (ACNet), which can be trained to reach a higher level of accuracy. After training, we equivalently convert the ACNet into the same original architecture, thus requiring no extra computations anymore. We have observed that ACNet can improve the performance of various models on CIFAR and ImageNet by a clear margin. Through further experiments, we attribute the effectiveness of ACB to its capability of enhancing the model’s robustness to rotational distortions and strengthening the central skeleton parts of square convolution kernels. |
Tasks | |
Published | 2019-08-11 |
URL | https://arxiv.org/abs/1908.03930v3 |
https://arxiv.org/pdf/1908.03930v3.pdf | |
PWC | https://paperswithcode.com/paper/acnet-strengthening-the-kernel-skeletons-for |
Repo | https://github.com/ShawnDing1994/ACNet |
Framework | pytorch |
Deep Iterative and Adaptive Learning for Graph Neural Networks
Title | Deep Iterative and Adaptive Learning for Graph Neural Networks |
Authors | Yu Chen, Lingfei Wu, Mohammed J. Zaki |
Abstract | In this paper, we propose an end-to-end graph learning framework, namely Deep Iterative and Adaptive Learning for Graph Neural Networks (DIAL-GNN), for jointly learning the graph structure and graph embeddings simultaneously. We first cast the graph structure learning problem as a similarity metric learning problem and leverage an adapted graph regularization for controlling smoothness, connectivity and sparsity of the generated graph. We further propose a novel iterative method for searching for a hidden graph structure that augments the initial graph structure. Our iterative method dynamically stops when the learned graph structure approaches close enough to the optimal graph. Our extensive experiments demonstrate that the proposed DIAL-GNN model can consistently outperform or match state-of-the-art baselines in terms of both downstream task performance and computational time. The proposed approach can cope with both transductive learning and inductive learning. |
Tasks | Metric Learning |
Published | 2019-12-17 |
URL | https://arxiv.org/abs/1912.07832v1 |
https://arxiv.org/pdf/1912.07832v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-iterative-and-adaptive-learning-for |
Repo | https://github.com/hugochan/IDGL |
Framework | none |
Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness
Title | Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness |
Authors | Tianyu Pang, Kun Xu, Yinpeng Dong, Chao Du, Ning Chen, Jun Zhu |
Abstract | Previous work shows that adversarially robust generalization requires larger sample complexity, and the same dataset, e.g., CIFAR-10, which enables good standard accuracy may not suffice to train robust models. Since collecting new training data could be costly, we focus on better utilizing the given data by inducing the regions with high sample density in the feature space, which could lead to locally sufficient samples for robust learning. We first formally show that the softmax cross-entropy (SCE) loss and its variants convey inappropriate supervisory signals, which encourage the learned feature points to spread over the space sparsely in training. This inspires us to propose the Max-Mahalanobis center (MMC) loss to explicitly induce dense feature regions in order to benefit robustness. Namely, the MMC loss encourages the model to concentrate on learning ordered and compact representations, which gather around the preset optimal centers for different classes. We empirically demonstrate that applying the MMC loss can significantly improve robustness even under strong adaptive attacks, while keeping state-of-the-art accuracy on clean inputs with little extra computation compared to the SCE loss. |
Tasks | |
Published | 2019-05-25 |
URL | https://arxiv.org/abs/1905.10626v3 |
https://arxiv.org/pdf/1905.10626v3.pdf | |
PWC | https://paperswithcode.com/paper/rethinking-softmax-cross-entropy-loss-for |
Repo | https://github.com/P2333/Max-Mahalanobis-Training |
Framework | tf |
MONet: Unsupervised Scene Decomposition and Representation
Title | MONet: Unsupervised Scene Decomposition and Representation |
Authors | Christopher P. Burgess, Loic Matthey, Nicholas Watters, Rishabh Kabra, Irina Higgins, Matt Botvinick, Alexander Lerchner |
Abstract | The ability to decompose scenes in terms of abstract building blocks is crucial for general intelligence. Where those basic building blocks share meaningful properties, interactions and other regularities across scenes, such decompositions can simplify reasoning and facilitate imagination of novel scenarios. In particular, representing perceptual observations in terms of entities should improve data efficiency and transfer performance on a wide range of tasks. Thus we need models capable of discovering useful decompositions of scenes by identifying units with such regularities and representing them in a common format. To address this problem, we have developed the Multi-Object Network (MONet). In this model, a VAE is trained end-to-end together with a recurrent attention network – in a purely unsupervised manner – to provide attention masks around, and reconstructions of, regions of images. We show that this model is capable of learning to decompose and represent challenging 3D scenes into semantically meaningful components, such as objects and background elements. |
Tasks | |
Published | 2019-01-22 |
URL | http://arxiv.org/abs/1901.11390v1 |
http://arxiv.org/pdf/1901.11390v1.pdf | |
PWC | https://paperswithcode.com/paper/monet-unsupervised-scene-decomposition-and |
Repo | https://github.com/deepmind/multi_object_datasets |
Framework | tf |
ChromaGAN: Adversarial Picture Colorization with Semantic Class Distribution
Title | ChromaGAN: Adversarial Picture Colorization with Semantic Class Distribution |
Authors | Patricia Vitoria, Lara Raad, Coloma Ballester |
Abstract | The colorization of grayscale images is an ill-posed problem, with multiple correct solutions. In this paper, we propose an adversarial learning colorization approach coupled with semantic information. A generative network is used to infer the chromaticity of a given grayscale image conditioned to semantic clues. This network is framed in an adversarial model that learns to colorize by incorporating perceptual and semantic understanding of color and class distributions. The model is trained via a fully self-supervised strategy. Qualitative and quantitative results show the capacity of the proposed method to colorize images in a realistic way achieving state-of-the-art results. |
Tasks | Colorization |
Published | 2019-07-23 |
URL | https://arxiv.org/abs/1907.09837v2 |
https://arxiv.org/pdf/1907.09837v2.pdf | |
PWC | https://paperswithcode.com/paper/chromagan-an-adversarial-approach-for-picture |
Repo | https://github.com/pvitoria/ChromaGAN |
Framework | tf |
Combining Generative and Discriminative Models for Hybrid Inference
Title | Combining Generative and Discriminative Models for Hybrid Inference |
Authors | Victor Garcia Satorras, Zeynep Akata, Max Welling |
Abstract | A graphical model is a structured representation of the data generating process. The traditional method to reason over random variables is to perform inference in this graphical model. However, in many cases the generating process is only a poor approximation of the much more complex true data generating process, leading to suboptimal estimation. The subtleties of the generative process are however captured in the data itself and we can `learn to infer’, that is, learn a direct mapping from observations to explanatory latent variables. In this work we propose a hybrid model that combines graphical inference with a learned inverse model, which we structure as in a graph neural network, while the iterative algorithm as a whole is formulated as a recurrent neural network. By using cross-validation we can automatically balance the amount of work performed by graphical inference versus learned inference. We apply our ideas to the Kalman filter, a Gaussian hidden Markov model for time sequences, and show, among other things, that our model can estimate the trajectory of a noisy chaotic Lorenz Attractor much more accurately than either the learned or graphical inference run in isolation. | |
Tasks | |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.02547v4 |
https://arxiv.org/pdf/1906.02547v4.pdf | |
PWC | https://paperswithcode.com/paper/combining-generative-and-discriminative-3 |
Repo | https://github.com/vgsatorras/hybrid-inference |
Framework | pytorch |
Improving Relation Extraction by Pre-trained Language Representations
Title | Improving Relation Extraction by Pre-trained Language Representations |
Authors | Christoph Alt, Marc Hübner, Leonhard Hennig |
Abstract | Current state-of-the-art relation extraction methods typically rely on a set of lexical, syntactic, and semantic features, explicitly computed in a pre-processing step. Training feature extraction models requires additional annotated language resources, which severely restricts the applicability and portability of relation extraction to novel languages. Similarly, pre-processing introduces an additional source of error. To address these limitations, we introduce TRE, a Transformer for Relation Extraction, extending the OpenAI Generative Pre-trained Transformer [Radford et al., 2018]. Unlike previous relation extraction models, TRE uses pre-trained deep language representations instead of explicit linguistic features to inform the relation classification and combines it with the self-attentive Transformer architecture to effectively model long-range dependencies between entity mentions. TRE allows us to learn implicit linguistic features solely from plain text corpora by unsupervised pre-training, before fine-tuning the learned language representations on the relation extraction task. TRE obtains a new state-of-the-art result on the TACRED and SemEval 2010 Task 8 datasets, achieving a test F1 of 67.4 and 87.1, respectively. Furthermore, we observe a significant increase in sample efficiency. With only 20% of the training examples, TRE matches the performance of our baselines and our model trained from scratch on 100% of the TACRED dataset. We open-source our trained models, experiments, and source code. |
Tasks | Relation Extraction |
Published | 2019-06-07 |
URL | https://arxiv.org/abs/1906.03088v1 |
https://arxiv.org/pdf/1906.03088v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-relation-extraction-by-pre-trained-1 |
Repo | https://github.com/DFKI-NLP/TRE |
Framework | pytorch |
Multi-Source Transfer Learning for Non-Stationary Environments
Title | Multi-Source Transfer Learning for Non-Stationary Environments |
Authors | Honghui Du, Leandro L. Minku, Huiyu Zhou |
Abstract | In data stream mining, predictive models typically suffer drops in predictive performance due to concept drift. As enough data representing the new concept must be collected for the new concept to be well learnt, the predictive performance of existing models usually takes some time to recover from concept drift. To speed up recovery from concept drift and improve predictive performance in data stream mining, this work proposes a novel approach called Multi-sourcE onLine TrAnsfer learning for Non-statIonary Environments (Melanie). Melanie is the first approach able to transfer knowledge between multiple data streaming sources in non-stationary environments. It creates several sub-classifiers to learn different aspects from different source and target concepts over time. The sub-classifiers that match the current target concept well are identified, and used to compose an ensemble for predicting examples from the target concept. We evaluate Melanie on several synthetic data streams containing different types of concept drift and on real world data streams. The results indicate that Melanie can deal with a variety drifts and improve predictive performance over existing data stream learning algorithms by making use of multiple sources. |
Tasks | Transfer Learning |
Published | 2019-01-07 |
URL | http://arxiv.org/abs/1901.02052v2 |
http://arxiv.org/pdf/1901.02052v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-source-transfer-learning-for-non |
Repo | https://github.com/nino2222/Melanie |
Framework | none |
Rule Applicability on RDF Triplestore Schemas
Title | Rule Applicability on RDF Triplestore Schemas |
Authors | Paolo Pareti, George Konstantinidis, Timothy J. Norman, Murat Şensoy |
Abstract | Rule-based systems play a critical role in health and safety, where policies created by experts are usually formalised as rules. When dealing with increasingly large and dynamic sources of data, as in the case of Internet of Things (IoT) applications, it becomes important not only to efficiently apply rules, but also to reason about their applicability on datasets confined by a certain schema. In this paper we define the notion of a triplestore schema which models a set of RDF graphs. Given a set of rules and such a schema as input we propose a method to determine rule applicability and produce output schemas. Output schemas model the graphs that would be obtained by running the rules on the graph models of the input schema. We present two approaches: one based on computing a canonical (critical) instance of the schema, and a novel approach based on query rewriting. We provide theoretical, complexity and evaluation results that show the superior efficiency of our rewriting approach. |
Tasks | |
Published | 2019-07-02 |
URL | https://arxiv.org/abs/1907.01627v1 |
https://arxiv.org/pdf/1907.01627v1.pdf | |
PWC | https://paperswithcode.com/paper/rule-applicability-on-rdf-triplestore-schemas |
Repo | https://github.com/paolo7/ap2 |
Framework | none |
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Title | ALBERT: A Lite BERT for Self-supervised Learning of Language Representations |
Authors | Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut |
Abstract | Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer training times. To address these problems, we present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows that our proposed methods lead to models that scale much better compared to the original BERT. We also use a self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and \squad benchmarks while having fewer parameters compared to BERT-large. The code and the pretrained models are available at https://github.com/google-research/ALBERT. |
Tasks | Linguistic Acceptability, Natural Language Inference, Question Answering, Semantic Textual Similarity |
Published | 2019-09-26 |
URL | https://arxiv.org/abs/1909.11942v6 |
https://arxiv.org/pdf/1909.11942v6.pdf | |
PWC | https://paperswithcode.com/paper/albert-a-lite-bert-for-self-supervised |
Repo | https://github.com/tensorflow/models/tree/master/official/nlp/albert |
Framework | tf |
Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction
Title | Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction |
Authors | Qi Pi, Weijie Bian, Guorui Zhou, Xiaoqiang Zhu, Kun Gai |
Abstract | Click-through rate (CTR) prediction is critical for industrial applications such as recommender system and online advertising. Practically, it plays an important role for CTR modeling in these applications by mining user interest from rich historical behavior data. Driven by the development of deep learning, deep CTR models with ingeniously designed architecture for user interest modeling have been proposed, bringing remarkable improvement of model performance over offline metric.However, great efforts are needed to deploy these complex models to online serving system for realtime inference, facing massive traffic request. Things turn to be more difficult when it comes to long sequential user behavior data, as the system latency and storage cost increase approximately linearly with the length of user behavior sequence. In this paper, we face directly the challenge of long sequential user behavior modeling and introduce our hands-on practice with the co-design of machine learning algorithm and online serving system for CTR prediction task. Theoretically, the co-design solution of UIC and MIMN enables us to handle the user interest modeling with unlimited length of sequential behavior data. Comparison between model performance and system efficiency proves the effectiveness of proposed solution. To our knowledge, this is one of the first industrial solutions that are capable of handling long sequential user behavior data with length scaling up to thousands. It now has been deployed in the display advertising system in Alibaba. |
Tasks | Click-Through Rate Prediction, Recommendation Systems |
Published | 2019-05-22 |
URL | https://arxiv.org/abs/1905.09248v3 |
https://arxiv.org/pdf/1905.09248v3.pdf | |
PWC | https://paperswithcode.com/paper/practice-on-long-sequential-user-behavior |
Repo | https://github.com/xiaominglalala/Session_based_Recommendation |
Framework | none |
CONAN – COunter NArratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech
Title | CONAN – COunter NArratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech |
Authors | Y. L. Chung, E. Kuzmenko, S. S. Tekiroglu, M. Guerini |
Abstract | Although there is an unprecedented effort to provide adequate responses in terms of laws and policies to hate content on social media platforms, dealing with hatred online is still a tough problem. Tackling hate speech in the standard way of content deletion or user suspension may be charged with censorship and overblocking. One alternate strategy, that has received little attention so far by the research community, is to actually oppose hate content with counter-narratives (i.e. informed textual responses). In this paper, we describe the creation of the first large-scale, multilingual, expert-based dataset of hate speech/counter-narrative pairs. This dataset has been built with the effort of more than 100 operators from three different NGOs that applied their training and expertise to the task. Together with the collected data we also provide additional annotations about expert demographics, hate and response type, and data augmentation through translation and paraphrasing. Finally, we provide initial experiments to assess the quality of our data. |
Tasks | Data Augmentation |
Published | 2019-10-08 |
URL | https://arxiv.org/abs/1910.03270v1 |
https://arxiv.org/pdf/1910.03270v1.pdf | |
PWC | https://paperswithcode.com/paper/conan-counter-narratives-through-1 |
Repo | https://github.com/marcoguerini/CONAN |
Framework | none |
Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality
Title | Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality |
Authors | Aakanksha Rana, Cagri Ozcinar, Aljoscha Smolic |
Abstract | Ambisonics i.e., a full-sphere surround sound, is quintessential with 360-degree visual content to provide a realistic virtual reality (VR) experience. While 360-degree visual content capture gained a tremendous boost recently, the estimation of corresponding spatial sound is still challenging due to the required sound-field microphones or information about the sound-source locations. In this paper, we introduce a novel problem of generating Ambisonics in 360-degree videos using the audio-visual cue. With this aim, firstly, a novel 360-degree audio-visual video dataset of 265 videos is introduced with annotated sound-source locations. Secondly, a pipeline is designed for an automatic Ambisonic estimation problem. Benefiting from the deep learning-based audio-visual feature-embedding and prediction modules, our pipeline estimates the 3D sound-source locations and further use such locations to encode to the B-format. To benchmark our dataset and pipeline, we additionally propose evaluation criteria to investigate the performance using different 360-degree input representations. Our results demonstrate the efficacy of the proposed pipeline and open up a new area of research in 360-degree audio-visual analysis for future investigations. |
Tasks | |
Published | 2019-08-16 |
URL | https://arxiv.org/abs/1908.06752v1 |
https://arxiv.org/pdf/1908.06752v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-generating-ambisonics-using-audio |
Repo | https://github.com/V-Sense/360AudioVisual |
Framework | none |
GResNet: Graph Residual Network for Reviving Deep GNNs from Suspended Animation
Title | GResNet: Graph Residual Network for Reviving Deep GNNs from Suspended Animation |
Authors | Jiawei Zhang, Lin Meng |
Abstract | The existing graph neural networks (GNNs) based on the spectral graph convolutional operator have been criticized for its performance degradation, which is especially common for the models with deep architectures. In this paper, we further identify the suspended animation problem with the existing GNNs. Such a problem happens when the model depth reaches the suspended animation limit, and the model will not respond to the training data any more and become not learnable. Analysis about the causes of the suspended animation problem with existing GNNs will be provided in this paper, whereas several other peripheral factors that will impact the problem will be reported as well. To resolve the problem, we introduce the GResNet (Graph Residual Network) framework in this paper, which creates extensively connected highways to involve nodes’ raw features or intermediate representations throughout the graph for all the model layers. Different from the other learning settings, the extensive connections in the graph data will render the existing simple residual learning methods fail to work. We prove the effectiveness of the introduced new graph residual terms from the norm preservation perspective, which will help avoid dramatic changes to the node’s representations between sequential layers. Detailed studies about the GResNet framework for many existing GNNs, including GCN, GAT and LoopyNet, will be reported in the paper with extensive empirical experiments on real-world benchmark datasets. |
Tasks | Node Classification |
Published | 2019-09-12 |
URL | https://arxiv.org/abs/1909.05729v2 |
https://arxiv.org/pdf/1909.05729v2.pdf | |
PWC | https://paperswithcode.com/paper/gresnet-graph-residuals-for-reviving-deep |
Repo | https://github.com/jwzhanggy/GResNet |
Framework | pytorch |
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Title | RoBERTa: A Robustly Optimized BERT Pretraining Approach |
Authors | Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov |
Abstract | Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD. These results highlight the importance of previously overlooked design choices, and raise questions about the source of recently reported improvements. We release our models and code. |
Tasks | Language Modelling, Lexical Simplification, Natural Language Inference, Question Answering, Reading Comprehension, Semantic Textual Similarity, Sentiment Analysis |
Published | 2019-07-26 |
URL | https://arxiv.org/abs/1907.11692v1 |
https://arxiv.org/pdf/1907.11692v1.pdf | |
PWC | https://paperswithcode.com/paper/roberta-a-robustly-optimized-bert-pretraining |
Repo | https://github.com/shreydesai/calibration |
Framework | pytorch |