February 1, 2020

3265 words 16 mins read

Paper Group AWR 167

Paper Group AWR 167

Conversion Prediction Using Multi-task Conditional Attention Networks to Support the Creation of Effective Ad Creative. Image Captioning with Clause-Focused Metrics in a Multi-Modal Setting for Marketing. BillSum: A Corpus for Automatic Summarization of US Legislation. Gradient Descent: The Ultimate Optimizer. Raw-to-End Name Entity Recognition in …

Conversion Prediction Using Multi-task Conditional Attention Networks to Support the Creation of Effective Ad Creative

Title Conversion Prediction Using Multi-task Conditional Attention Networks to Support the Creation of Effective Ad Creative
Authors Shunsuke Kitada, Hitoshi Iyatomi, Yoshifumi Seki
Abstract Accurately predicting conversions in advertisements is generally a challenging task, because such conversions do not occur frequently. In this paper, we propose a new framework to support creating high-performing ad creatives, including the accurate prediction of ad creative text conversions before delivering to the consumer. The proposed framework includes three key ideas: multi-task learning, conditional attention, and attention highlighting. Multi-task learning is an idea for improving the prediction accuracy of conversion, which predicts clicks and conversions simultaneously, to solve the difficulty of data imbalance. Furthermore, conditional attention focuses attention of each ad creative with the consideration of its genre and target gender, thus improving conversion prediction accuracy. Attention highlighting visualizes important words and/or phrases based on conditional attention. We evaluated the proposed framework with actual delivery history data (14,000 creatives displayed more than a certain number of times from Gunosy Inc.), and confirmed that these ideas improve the prediction performance of conversions, and visualize noteworthy words according to the creatives’ attributes.
Tasks Multi-Task Learning
Published 2019-05-17
URL https://arxiv.org/abs/1905.07289v1
PDF https://arxiv.org/pdf/1905.07289v1.pdf
PWC https://paperswithcode.com/paper/conversion-prediction-using-multi-task
Repo https://github.com/shunk031/Multi-task-Conditional-Attention-Networks
Framework none

Image Captioning with Clause-Focused Metrics in a Multi-Modal Setting for Marketing

Title Image Captioning with Clause-Focused Metrics in a Multi-Modal Setting for Marketing
Authors Philipp Harzig, Dan Zecha, Rainer Lienhart, Carolin Kaiser, René Schallner
Abstract Automatically generating descriptive captions for images is a well-researched area in computer vision. However, existing evaluation approaches focus on measuring the similarity between two sentences disregarding fine-grained semantics of the captions. In our setting of images depicting persons interacting with branded products, the subject, predicate, object and the name of the branded product are important evaluation criteria of the generated captions. Generating image captions with these constraints is a new challenge, which we tackle in this work. By simultaneously predicting integer-valued ratings that describe attributes of the human-product interaction, we optimize a deep neural network architecture in a multi-task learning setting, which considerably improves the caption quality. Furthermore, we introduce a novel metric that allows us to assess whether the generated captions meet our requirements (i.e., subject, predicate, object, and product name) and describe a series of experiments on caption quality and how to address annotator disagreements for the image ratings with an approach called soft targets. We also show that our novel clause-focused metrics are also applicable to other image captioning datasets, such as the popular MSCOCO dataset.
Tasks Image Captioning, Multi-Task Learning
Published 2019-05-06
URL https://arxiv.org/abs/1905.01919v1
PDF https://arxiv.org/pdf/1905.01919v1.pdf
PWC https://paperswithcode.com/paper/image-captioning-with-clause-focused-metrics
Repo https://github.com/philm5/mscoco-spo-triples
Framework none

BillSum: A Corpus for Automatic Summarization of US Legislation

Title BillSum: A Corpus for Automatic Summarization of US Legislation
Authors Anastassia Kornilova, Vlad Eidelman
Abstract Automatic summarization methods have been studied on a variety of domains, including news and scientific articles. Yet, legislation has not previously been considered for this task, despite US Congress and state governments releasing tens of thousands of bills every year. In this paper, we introduce BillSum, the first dataset for summarization of US Congressional and California state bills (https://github.com/FiscalNote/BillSum). We explain the properties of the dataset that make it more challenging to process than other domains. Then, we benchmark extractive methods that consider neural sentence representations and traditional contextual features. Finally, we demonstrate that models built on Congressional bills can be used to summarize California bills, thus, showing that methods developed on this dataset can transfer to states without human-written summaries.
Tasks
Published 2019-10-01
URL https://arxiv.org/abs/1910.00523v2
PDF https://arxiv.org/pdf/1910.00523v2.pdf
PWC https://paperswithcode.com/paper/billsum-a-corpus-for-automatic-summarization
Repo https://github.com/FiscalNote/BillSum
Framework tf

Gradient Descent: The Ultimate Optimizer

Title Gradient Descent: The Ultimate Optimizer
Authors Kartik Chandra, Erik Meijer, Samantha Andow, Emilio Arroyo-Fang, Irene Dea, Johann George, Melissa Grueter, Basil Hosmer, Steffi Stumpos, Alanna Tempest, Shannon Yang
Abstract Working with any gradient-based machine learning algorithm involves the tedious task of tuning the optimizer’s hyperparameters, such as the learning rate. There exist many techniques for automated hyperparameter optimization, but they typically introduce even more hyperparameters to control the hyperparameter optimization process. We propose to instead learn the hyperparameters themselves by gradient descent, and furthermore to learn the hyper-hyperparameters by gradient descent as well, and so on ad infinitum. As these towers of gradient-based optimizers grow, they become significantly less sensitive to the choice of top-level hyperparameters, hence decreasing the burden on the user to search for optimal values.
Tasks Hyperparameter Optimization
Published 2019-09-29
URL https://arxiv.org/abs/1909.13371v1
PDF https://arxiv.org/pdf/1909.13371v1.pdf
PWC https://paperswithcode.com/paper/gradient-descent-the-ultimate-optimizer
Repo https://github.com/Rainymood/Gradient-Descent-The-Ultimate-Optimizer
Framework pytorch

Raw-to-End Name Entity Recognition in Social Media

Title Raw-to-End Name Entity Recognition in Social Media
Authors Liyuan Liu, Zihan Wang, Jingbo Shang, Dandong Yin, Heng Ji, Xiang Ren, Shaowen Wang, Jiawei Han
Abstract Taking word sequences as the input, typical named entity recognition (NER) models neglect errors from pre-processing (e.g., tokenization). However, these errors can influence the model performance greatly, especially for noisy texts like tweets. Here, we introduce Neural-Char-CRF, a raw-to-end framework that is more robust to pre-processing errors. It takes raw character sequences as inputs and makes end-to-end predictions. Word embedding and contextualized representation models are further tailored to capture textual signals for each character instead of each word. Our model neither requires the conversion from character sequences to word sequences, nor assumes tokenizer can correctly detect all word boundaries. Moreover, we observe our model performance remains unchanged after replacing tokenization with string matching, which demonstrates its potential to be tokenization-free. Extensive experimental results on two public datasets demonstrate the superiority of our proposed method over the state of the art. The implementations and datasets are made available at: https://github.com/LiyuanLucasLiu/Raw-to-End.
Tasks Named Entity Recognition, Tokenization
Published 2019-08-14
URL https://arxiv.org/abs/1908.05344v1
PDF https://arxiv.org/pdf/1908.05344v1.pdf
PWC https://paperswithcode.com/paper/raw-to-end-name-entity-recognition-in-social
Repo https://github.com/LiyuanLucasLiu/Raw-to-End
Framework pytorch

Multi-task Learning for Influence Estimation and Maximization

Title Multi-task Learning for Influence Estimation and Maximization
Authors George Panagopoulos, Fragkiskos D. Malliaros, Michalis Vazirgiannis
Abstract We address the problem of influence maximization when the social network is accompanied by diffusion cascades. In prior works, such information is used to compute influence probabilities, which is utilized by stochastic diffusion models in influence maximization. Motivated by the recent criticism on the effectiveness of diffusion models as well as the galloping advancements in influence learning, we propose IMINFECTOR (Influence Maximization with INFluencer vECTORs), a unified approach that uses representations learned from diffusion cascades to perform model-independent influence maximization that scales in real-world datasets. The first part of our methodology is a multi-task neural network that learns embeddings of nodes that initiate cascades (influencer vectors) and embeddings of nodes that participate in them (susceptible vectors). The norm of an influencer vector captures the ability of the node to create lengthy cascades and is used to estimate the expected influence spread and reduce the number of candidate seeds. In addition, the combination of influencer and susceptible vectors form the diffusion probabilities between nodes. These are used to reformulate the network as a bipartite graph and propose a greedy solution to influence maximization that retains the theoretical guarantees.We a pply our method in three sizable networks with diffusion cascades and evaluate it using cascades from future time steps. IMINFECTOR is able to scale in all of them and outperforms various competitive algorithms and metrics from the diverse landscape of influence maximization in terms of efficiency and seed set quality.
Tasks Multi-Task Learning, Representation Learning
Published 2019-04-18
URL https://arxiv.org/abs/1904.08804v2
PDF https://arxiv.org/pdf/1904.08804v2.pdf
PWC https://paperswithcode.com/paper/influence-maximization-via-representation
Repo https://github.com/GiorgosPanagopoulos/Influence-Maximization-via-Representation-Learning
Framework tf

Adversarial Examples in Modern Machine Learning: A Review

Title Adversarial Examples in Modern Machine Learning: A Review
Authors Rey Reza Wiyatno, Anqi Xu, Ousmane Dia, Archy de Berker
Abstract Recent research has found that many families of machine learning models are vulnerable to adversarial examples: inputs that are specifically designed to cause the target model to produce erroneous outputs. In this survey, we focus on machine learning models in the visual domain, where methods for generating and detecting such examples have been most extensively studied. We explore a variety of adversarial attack methods that apply to image-space content, real world adversarial attacks, adversarial defenses, and the transferability property of adversarial examples. We also discuss strengths and weaknesses of various methods of adversarial attack and defense. Our aim is to provide an extensive coverage of the field, furnishing the reader with an intuitive understanding of the mechanics of adversarial attack and defense mechanisms and enlarging the community of researchers studying this fundamental set of problems.
Tasks Adversarial Attack
Published 2019-11-13
URL https://arxiv.org/abs/1911.05268v2
PDF https://arxiv.org/pdf/1911.05268v2.pdf
PWC https://paperswithcode.com/paper/adversarial-examples-in-modern-machine
Repo https://github.com/cs-giung/course-dl-TP
Framework pytorch

On the Pitfalls of Measuring Emergent Communication

Title On the Pitfalls of Measuring Emergent Communication
Authors Ryan Lowe, Jakob Foerster, Y-Lan Boureau, Joelle Pineau, Yann Dauphin
Abstract How do we know if communication is emerging in a multi-agent system? The vast majority of recent papers on emergent communication show that adding a communication channel leads to an increase in reward or task success. This is a useful indicator, but provides only a coarse measure of the agent’s learned communication abilities. As we move towards more complex environments, it becomes imperative to have a set of finer tools that allow qualitative and quantitative insights into the emergence of communication. This may be especially useful to allow humans to monitor agents’ behaviour, whether for fault detection, assessing performance, or even building trust. In this paper, we examine a few intuitive existing metrics for measuring communication, and show that they can be misleading. Specifically, by training deep reinforcement learning agents to play simple matrix games augmented with a communication channel, we find a scenario where agents appear to communicate (their messages provide information about their subsequent action), and yet the messages do not impact the environment or other agent in any way. We explain this phenomenon using ablation studies and by visualizing the representations of the learned policies. We also survey some commonly used metrics for measuring emergent communication, and provide recommendations as to when these metrics should be used.
Tasks Fault Detection
Published 2019-03-12
URL http://arxiv.org/abs/1903.05168v1
PDF http://arxiv.org/pdf/1903.05168v1.pdf
PWC https://paperswithcode.com/paper/on-the-pitfalls-of-measuring-emergent
Repo https://github.com/facebookresearch/measuring-emergent-comm
Framework pytorch

Active Subspace of Neural Networks: Structural Analysis and Universal Attacks

Title Active Subspace of Neural Networks: Structural Analysis and Universal Attacks
Authors Chunfeng Cui, Kaiqi Zhang, Talgat Daulbaev, Julia Gusak, Ivan Oseledets, Zheng Zhang
Abstract Active subspace is a model reduction method widely used in the uncertainty quantification community. In this paper, we propose analyzing the internal structure and vulnerability and deep neural networks using active subspace. Firstly, we employ the active subspace to measure the number of “active neurons” at each intermediate layer and reduce the number of neurons from several thousands to several dozens. This motivates us to change the network structure and to develop a new and more compact network, referred to as {ASNet}, that has significantly fewer model parameters. Secondly, we propose analyzing the vulnerability of a neural network using active subspace and finding an additive universal adversarial attack vector that can misclassify a dataset with a high probability. Our experiments on CIFAR-10 show that ASNet can achieve 23.98$\times$ parameter and 7.30$\times$ flops reduction. The universal active subspace attack vector can achieve around 20% higher attack ratio compared with the existing approach in all of our numerical experiments. The PyTorch codes for this paper are available online.
Tasks Adversarial Attack
Published 2019-10-29
URL https://arxiv.org/abs/1910.13025v1
PDF https://arxiv.org/pdf/1910.13025v1.pdf
PWC https://paperswithcode.com/paper/active-subspace-of-neural-networks-structural
Repo https://github.com/chunfengc/ASNet
Framework pytorch

Testing with Fewer Resources: An Adaptive Approach to Performance-Aware Test Case Generation

Title Testing with Fewer Resources: An Adaptive Approach to Performance-Aware Test Case Generation
Authors Giovanni Grano, Christoph Laaber, Annibale Panichella, Sebastiano Panichella
Abstract Automated test case generation is an effective technique to yield high-coverage test suites. While the majority of research effort has been devoted to satisfying coverage criteria, a recent trend emerged towards optimizing other non-coverage aspects. In this regard, runtime and memory usage are two essential dimensions: less expensive tests reduce the resource demands for the generation process and later regression testing phases. This study shows that performance-aware test case generation requires solving two main challenges: providing a good approximation of resource usage with minimal overhead and avoiding detrimental effects on both final coverage and fault detection effectiveness. To tackle these challenges, we conceived a set of performance proxies – inspired by previous work on performance testing – that provide a reasonable estimation of the test execution costs (i.e., runtime and memory usage). Thus, we propose an adaptive strategy, called aDynaMOSA, which leverages these proxies by extending DynaMOSA, a state-of-the-art evolutionary algorithm in unit testing. Our empirical study – involving 110 non-trivial Java classes – reveals that our adaptive approach generates test suite with statistically significant improvements in runtime (-25%) and heap memory consumption (-15%) compared to DynaMOSA. Additionally, aDynaMOSA has comparable results to DynaMOSA over seven different coverage criteria and similar fault detection effectiveness. Our empirical investigation also highlights that the usage of performance proxies (i.e., without the adaptiveness) is not sufficient to generate more performant test cases without compromising the overall coverage.
Tasks Fault Detection
Published 2019-07-19
URL https://arxiv.org/abs/1907.08578v3
PDF https://arxiv.org/pdf/1907.08578v3.pdf
PWC https://paperswithcode.com/paper/testing-with-fewer-resources-an-adaptive
Repo https://github.com/sealuzh/dynamic-performance-replication
Framework none

GRDN:Grouped Residual Dense Network for Real Image Denoising and GAN-based Real-world Noise Modeling

Title GRDN:Grouped Residual Dense Network for Real Image Denoising and GAN-based Real-world Noise Modeling
Authors Dong-Wook Kim, Jae Ryun Chung, Seung-Won Jung
Abstract Recent research on image denoising has progressed with the development of deep learning architectures, especially convolutional neural networks. However, real-world image denoising is still very challenging because it is not possible to obtain ideal pairs of ground-truth images and real-world noisy images. Owing to the recent release of benchmark datasets, the interest of the image denoising community is now moving toward the real-world denoising problem. In this paper, we propose a grouped residual dense network (GRDN), which is an extended and generalized architecture of the state-of-the-art residual dense network (RDN). The core part of RDN is defined as grouped residual dense block (GRDB) and used as a building module of GRDN. We experimentally show that the image denoising performance can be significantly improved by cascading GRDBs. In addition to the network architecture design, we also develop a new generative adversarial network-based real-world noise modeling method. We demonstrate the superiority of the proposed methods by achieving the highest score in terms of both the peak signal-to-noise ratio and the structural similarity in the NTIRE2019 Real Image Denoising Challenge - Track 2:sRGB.
Tasks Denoising, Image Denoising
Published 2019-05-27
URL https://arxiv.org/abs/1905.11172v1
PDF https://arxiv.org/pdf/1905.11172v1.pdf
PWC https://paperswithcode.com/paper/grdngrouped-residual-dense-network-for-real
Repo https://github.com/BusterChung/NTIRE_test_code
Framework pytorch

Convolutional Neural Networks with Layer Reuse

Title Convolutional Neural Networks with Layer Reuse
Authors Okan Köpüklü, Maryam Babaee, Stefan Hörmann, Gerhard Rigoll
Abstract A convolutional layer in a Convolutional Neural Network (CNN) consists of many filters which apply convolution operation to the input, capture some special patterns and pass the result to the next layer. If the same patterns also occur at the deeper layers of the network, why wouldn’t the same convolutional filters be used also in those layers? In this paper, we propose a CNN architecture, Layer Reuse Network (LruNet), where the convolutional layers are used repeatedly without the need of introducing new layers to get a better performance. This approach introduces several advantages: (i) Considerable amount of parameters are saved since we are reusing the layers instead of introducing new layers, (ii) the Memory Access Cost (MAC) can be reduced since reused layer parameters can be fetched only once, (iii) the number of nonlinearities increases with layer reuse, and (iv) reused layers get gradient updates from multiple parts of the network. The proposed approach is evaluated on CIFAR-10, CIFAR-100 and Fashion-MNIST datasets for image classification task, and layer reuse improves the performance by 5.14%, 5.85% and 2.29%, respectively. The source code and pretrained models are publicly available.
Tasks Image Classification
Published 2019-01-28
URL http://arxiv.org/abs/1901.09615v2
PDF http://arxiv.org/pdf/1901.09615v2.pdf
PWC https://paperswithcode.com/paper/convolutional-neural-networks-with-layer
Repo https://github.com/okankop/CNN-layer-reuse
Framework pytorch

Modulating Image Restoration with Continual Levels via Adaptive Feature Modification Layers

Title Modulating Image Restoration with Continual Levels via Adaptive Feature Modification Layers
Authors Jingwen He, Chao Dong, Yu Qiao
Abstract In image restoration tasks, like denoising and super resolution, continual modulation of restoration levels is of great importance for real-world applications, but has failed most of existing deep learning based image restoration methods. Learning from discrete and fixed restoration levels, deep models cannot be easily generalized to data of continuous and unseen levels. This topic is rarely touched in literature, due to the difficulty of modulating well-trained models with certain hyper-parameters. We make a step forward by proposing a unified CNN framework that consists of few additional parameters than a single-level model yet could handle arbitrary restoration levels between a start and an end level. The additional module, namely AdaFM layer, performs channel-wise feature modification, and can adapt a model to another restoration level with high accuracy. By simply tweaking an interpolation coefficient, the intermediate model - AdaFM-Net could generate smooth and continuous restoration effects without artifacts. Extensive experiments on three image restoration tasks demonstrate the effectiveness of both model training and modulation testing. Besides, we carefully investigate the properties of AdaFM layers, providing a detailed guidance on the usage of the proposed method.
Tasks Denoising, Image Denoising, Image Restoration, Image Super-Resolution, Super-Resolution
Published 2019-04-17
URL https://arxiv.org/abs/1904.08118v3
PDF https://arxiv.org/pdf/1904.08118v3.pdf
PWC https://paperswithcode.com/paper/modulating-image-restoration-with-continual
Repo https://github.com/hejingwenhejingwen/AdaFM
Framework pytorch

A Maximum Likelihood Approach to Extract Finite Planes from 3-D Laser Scans

Title A Maximum Likelihood Approach to Extract Finite Planes from 3-D Laser Scans
Authors Alexander Schaefer, Johan Vertens, Daniel Büscher, Wolfram Burgard
Abstract Whether it is object detection, model reconstruction, laser odometry, or point cloud registration: Plane extraction is a vital component of many robotic systems. In this paper, we propose a strictly probabilistic method to detect finite planes in organized 3-D laser range scans. An agglomerative hierarchical clustering technique, our algorithm builds planes from bottom up, always extending a plane by the point that decreases the measurement likelihood of the scan the least. In contrast to most related methods, which rely on heuristics like orthogonal point-to-plane distance, we leverage the ray path information to compute the measurement likelihood. We evaluate our approach not only on the popular SegComp benchmark, but also provide a challenging synthetic dataset that overcomes SegComp’s deficiencies. Both our implementation and the suggested dataset are available at www.github.com/acschaefer/ppe.
Tasks Object Detection, Point Cloud Registration
Published 2019-10-23
URL https://arxiv.org/abs/1910.11146v1
PDF https://arxiv.org/pdf/1910.11146v1.pdf
PWC https://paperswithcode.com/paper/a-maximum-likelihood-approach-to-extract
Repo https://github.com/acschaefer/ppe
Framework none

Emergence of Network Motifs in Deep Neural Networks

Title Emergence of Network Motifs in Deep Neural Networks
Authors Matteo Zambra, Alberto Testolin, Amos Maritan
Abstract Network science can offer fundamental insights into the structural and functional properties of complex systems. For example, it is widely known that neuronal circuits tend to organize into basic functional topological modules, called “network motifs”. In this article we show that network science tools can be successfully applied also to the study of artificial neural networks operating according to self-organizing (learning) principles. In particular, we study the emergence of network motifs in multi-layer perceptrons, whose initial connectivity is defined as a stack of fully-connected, bipartite graphs. Our simulations show that the final network topology is primarily shaped by learning dynamics, but can be strongly biased by choosing appropriate weight initialization schemes. Overall, our results suggest that non-trivial initialization strategies can make learning more effective by promoting the development of useful network motifs, which are often surprisingly consistent with those observed in general transduction networks.
Tasks
Published 2019-12-27
URL https://arxiv.org/abs/1912.12244v1
PDF https://arxiv.org/pdf/1912.12244v1.pdf
PWC https://paperswithcode.com/paper/emergence-of-network-motifs-in-deep-neural
Repo https://github.com/MatteoZambra/SM_ML__MScThesis
Framework none
comments powered by Disqus