January 29, 2020

3308 words 16 mins read

Paper Group ANR 764

Optimal Unsupervised Domain Translation. On Stationary-Point Hitting Time and Ergodicity of Stochastic Gradient Langevin Dynamics. Multi-Grained Spatio-temporal Modeling for Lip-reading. Beyond Adaptive Submodularity: Approximation Guarantees of Greedy Policy with Adaptive Submodularity Ratio. Double Transfer Learning for Breast Cancer Histopatholo …

Optimal Unsupervised Domain Translation


Title	Optimal Unsupervised Domain Translation
Authors	Emmanuel de Bézenac, Ibrahim Ayed, Patrick Gallinari
Abstract	Domain Translation is the problem of finding a meaningful correspondence between two domains. Since in a majority of settings paired supervision is not available, much work focuses on Unsupervised Domain Translation (UDT) where data samples from each domain are unpaired. Following the seminal work of CycleGAN for UDT, many variants and extensions of this model have been proposed. However, there is still little theoretical understanding behind their success. We observe that these methods yield solutions which are approximately minimal w.r.t. a given transportation cost, leading us to reformulate the problem in the Optimal Transport (OT) framework. This viewpoint gives us a new perspective on Unsupervised Domain Translation and allows us to prove the existence and uniqueness of the retrieved mapping, given a large family of transport costs. We then propose a novel framework to efficiently compute optimal mappings in a dynamical setting. We show that it generalizes previous methods and enables a more explicit control over the computed optimal mapping. It also provides smooth interpolations between the two domains. Experiments on toy and real world datasets illustrate the behavior of our method.
Tasks
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01292v1
PDF	https://arxiv.org/pdf/1906.01292v1.pdf
PWC	https://paperswithcode.com/paper/optimal-unsupervised-domain-translation
Repo
Framework

On Stationary-Point Hitting Time and Ergodicity of Stochastic Gradient Langevin Dynamics


Title	On Stationary-Point Hitting Time and Ergodicity of Stochastic Gradient Langevin Dynamics
Authors	Xi Chen, Simon S. Du, Xin T. Tong
Abstract	Stochastic gradient Langevin dynamics (SGLD) is a fundamental algorithm in stochastic optimization. Recent work by Zhang et al. [2017] presents an analysis for the hitting time of SGLD for the first and second order stationary points. The proof in Zhang et al. [2017] is a two-stage procedure through bounding the Cheeger’s constant, which is rather complicated and leads to loose bounds. In this paper, using intuitions from stochastic differential equations, we provide a direct analysis for the hitting times of SGLD to the first and second order stationary points. Our analysis is straightforward. It only relies on basic linear algebra and probability theory tools. Our direct analysis also leads to tighter bounds comparing to Zhang et al. [2017] and shows the explicit dependence of the hitting time on different factors, including dimensionality, smoothness, noise strength, and step size effects. Under suitable conditions, we show that the hitting time of SGLD to first-order stationary points can be dimension-independent. Moreover, we apply our analysis to study several important online estimation problems in machine learning, including linear regression, matrix factorization, and online PCA.
Tasks	Stochastic Optimization
Published	2019-04-30
URL	https://arxiv.org/abs/1904.13016v4
PDF	https://arxiv.org/pdf/1904.13016v4.pdf
PWC	https://paperswithcode.com/paper/hitting-time-of-stochastic-gradient-langevin
Repo
Framework

Multi-Grained Spatio-temporal Modeling for Lip-reading


Title	Multi-Grained Spatio-temporal Modeling for Lip-reading
Authors	Chenhao Wang
Abstract	Lip-reading aims to recognize speech content from videos via visual analysis of speakers’ lip movements. This is a challenging task due to the existence of homophemes-words which involve identical or highly similar lip movements, as well as diverse lip appearances and motion patterns among the speakers. To address these challenges, we propose a novel lip-reading model which captures not only the nuance between words but also styles of different speakers, by a multi-grained spatio-temporal modeling of the speaking process. Specifically, we first extract both frame-level fine-grained features and short-term medium-grained features by the visual front-end, which are then combined to obtain discriminative representations for words with similar phonemes. Next, a bidirectional ConvLSTM augmented with temporal attention aggregates spatio-temporal information in the entire input sequence, which is expected to be able to capture the coarse-gained patterns of each word and robust to various conditions in speaker identity, lighting conditions, and so on. By making full use of the information from different levels in a unified framework, the model is not only able to distinguish words with similar pronunciations, but also becomes robust to appearance changes. We evaluate our method on two challenging word-level lip-reading benchmarks and show the effectiveness of the proposed method, which also demonstrate the above claims.
Tasks
Published	2019-08-30
URL	https://arxiv.org/abs/1908.11618v2
PDF	https://arxiv.org/pdf/1908.11618v2.pdf
PWC	https://paperswithcode.com/paper/multi-grained-spatio-temporal-modeling-for
Repo
Framework

Beyond Adaptive Submodularity: Approximation Guarantees of Greedy Policy with Adaptive Submodularity Ratio


Title	Beyond Adaptive Submodularity: Approximation Guarantees of Greedy Policy with Adaptive Submodularity Ratio
Authors	Kaito Fujii, Shinsaku Sakaue
Abstract	We propose a new concept named adaptive submodularity ratio to study the greedy policy for sequential decision making. While the greedy policy is known to perform well for a wide variety of adaptive stochastic optimization problems in practice, its theoretical properties have been analyzed only for a limited class of problems. We narrow the gap between theory and practice by using adaptive submodularity ratio, which enables us to prove approximation guarantees of the greedy policy for a substantially wider class of problems. Examples of newly analyzed problems include important applications such as adaptive influence maximization and adaptive feature selection. Our adaptive submodularity ratio also provides bounds of adaptivity gaps. Experiments confirm that the greedy policy performs well with the applications being considered compared to standard heuristics.
Tasks	Decision Making, Feature Selection, Stochastic Optimization
Published	2019-04-24
URL	http://arxiv.org/abs/1904.10748v1
PDF	http://arxiv.org/pdf/1904.10748v1.pdf
PWC	https://paperswithcode.com/paper/beyond-adaptive-submodularity-approximation
Repo
Framework

Double Transfer Learning for Breast Cancer Histopathologic Image Classification


Title	Double Transfer Learning for Breast Cancer Histopathologic Image Classification
Authors	Jonathan de Matos, Alceu de S. Britto Jr., Luiz E. S. Oliveira, Alessandro L. Koerich
Abstract	This work proposes a classification approach for breast cancer histopathologic images (HI) that uses transfer learning to extract features from HI using an Inception-v3 CNN pre-trained with ImageNet dataset. We also use transfer learning on training a support vector machine (SVM) classifier on a tissue labeled colorectal cancer dataset aiming to filter the patches from a breast cancer HI and remove the irrelevant ones. We show that removing irrelevant patches before training a second SVM classifier, improves the accuracy for classifying malign and benign tumors on breast cancer images. We are able to improve the classification accuracy in 3.7% using the feature extraction transfer learning and an additional 0.7% using the irrelevant patch elimination. The proposed approach outperforms the state-of-the-art in three out of the four magnification factors of the breast cancer dataset.
Tasks	Image Classification, Transfer Learning
Published	2019-04-16
URL	http://arxiv.org/abs/1904.07834v1
PDF	http://arxiv.org/pdf/1904.07834v1.pdf
PWC	https://paperswithcode.com/paper/double-transfer-learning-for-breast-cancer
Repo
Framework

Constrained Deep Networks: Lagrangian Optimization via Log-Barrier Extensions


Title	Constrained Deep Networks: Lagrangian Optimization via Log-Barrier Extensions
Authors	Hoel Kervadec, Jose Dolz, Jing Yuan, Christian Desrosiers, Eric Granger, Ismail Ben Ayed
Abstract	This study investigates imposing inequality constraints on the outputs of CNNs, with application to weakly supervised segmentation. In the context of deep networks, constraints are commonly handled with penalties for their simplicity, and despite their well-known limitations. Lagrangian-dual optimization has been largely avoided, mainly due to the computational complexity and stability/convergence issues caused by alternating explicit dual updates/projections and stochastic optimization. Several studies showed that, for deep CNNs, the theoretical and practical advantages of Lagrangian optimization over penalties do not materialize in practice. We propose log-barrier extensions, which approximate Lagrangian optimization of constrained-CNN problems with a sequence of unconstrained losses. Unlike standard interior-point and log-barrier methods, our formulation does not need an initial feasible solution. Furthermore, we provide a new technical result, which shows that the proposed extensions yield an upper bound on the duality gap. This generalizes the duality-gap result of standard log-barriers, yielding sub-optimality certificates for feasible solutions. While sub-optimality is not guaranteed for non-convex problems, our result shows that log-barrier extensions are a principled way to approximate Lagrangian optimization for constrained CNNs via implicit dual variables. We report comprehensive constrained-CNN experiments, showing that our formulation outperforms several penalty-based methods, in terms of accuracy and training stability.
Tasks	Stochastic Optimization
Published	2019-04-08
URL	https://arxiv.org/abs/1904.04205v3
PDF	https://arxiv.org/pdf/1904.04205v3.pdf
PWC	https://paperswithcode.com/paper/log-barrier-constrained-cnns
Repo
Framework

Incentive Design for Efficient Federated Learning in Mobile Networks: A Contract Theory Approach


Title	Incentive Design for Efficient Federated Learning in Mobile Networks: A Contract Theory Approach
Authors	Jiawen Kang, Zehui Xiong, Dusit Niyato, Han Yu, Ying-Chang Liang, Dong In Kim
Abstract	To strengthen data privacy and security, federated learning as an emerging machine learning technique is proposed to enable large-scale nodes, e.g., mobile devices, to distributedly train and globally share models without revealing their local data. This technique can not only significantly improve privacy protection for mobile devices, but also ensure good performance of the trained results collectively. Currently, most the existing studies focus on optimizing federated learning algorithms to improve model training performance. However, incentive mechanisms to motivate the mobile devices to join model training have been largely overlooked. The mobile devices suffer from considerable overhead in terms of computation and communication during the federated model training process. Without well-designed incentive, self-interested mobile devices will be unwilling to join federated learning tasks, which hinders the adoption of federated learning. To bridge this gap, in this paper, we adopt the contract theory to design an effective incentive mechanism for simulating the mobile devices with high-quality (i.e., high-accuracy) data to participate in federated learning. Numerical results demonstrate that the proposed mechanism is efficient for federated learning with improved learning accuracy.
Tasks
Published	2019-05-16
URL	https://arxiv.org/abs/1905.07479v2
PDF	https://arxiv.org/pdf/1905.07479v2.pdf
PWC	https://paperswithcode.com/paper/incentive-design-for-efficient-federated
Repo
Framework

SNIDER: Single Noisy Image Denoising and Rectification for Improving License Plate Recognition


Title	SNIDER: Single Noisy Image Denoising and Rectification for Improving License Plate Recognition
Authors	Younkwan Lee, Juhyun Lee, Hoyeon Ahn, Moongu Jeon
Abstract	In this paper, we present an algorithm for real-world license plate recognition (LPR) from a low-quality image. Our method is built upon a framework that includes denoising and rectification, and each task is conducted by Convolutional Neural Networks. Existing denoising and rectification have been treated separately as a single network in previous research. In contrast to the previous work, we here propose an end-to-end trainable network for image recovery, Single Noisy Image DEnoising and Rectification (SNIDER), which focuses on solving both the problems jointly. It overcomes those obstacles by designing a novel network to address the denoising and rectification jointly. Moreover, we propose a way to leverage optimization with the auxiliary tasks for multi-task fitting and novel training losses. Extensive experiments on two challenging LPR datasets demonstrate the effectiveness of our proposed method in recovering the high-quality license plate image from the low-quality one and show that the the proposed method outperforms other state-of-the-art methods.
Tasks	Denoising, Image Denoising, License Plate Recognition
Published	2019-10-09
URL	https://arxiv.org/abs/1910.03876v1
PDF	https://arxiv.org/pdf/1910.03876v1.pdf
PWC	https://paperswithcode.com/paper/snider-single-noisy-image-denoising-and
Repo
Framework

Implicit Discourse Relation Identification for Open-domain Dialogues


Title	Implicit Discourse Relation Identification for Open-domain Dialogues
Authors	Mingyu Derek Ma, Kevin K. Bowden, Jiaqi Wu, Wen Cui, Marilyn Walker
Abstract	Discourse relation identification has been an active area of research for many years, and the challenge of identifying implicit relations remains largely an unsolved task, especially in the context of an open-domain dialogue system. Previous work primarily relies on a corpora of formal text which is inherently non-dialogic, i.e., news and journals. This data however is not suitable to handle the nuances of informal dialogue nor is it capable of navigating the plethora of valid topics present in open-domain dialogue. In this paper, we designed a novel discourse relation identification pipeline specifically tuned for open-domain dialogue systems. We firstly propose a method to automatically extract the implicit discourse relation argument pairs and labels from a dataset of dialogic turns, resulting in a novel corpus of discourse relation pairs; the first of its kind to attempt to identify the discourse relations connecting the dialogic turns in open-domain discourse. Moreover, we have taken the first steps to leverage the dialogue features unique to our task to further improve the identification of such relations by performing feature ablation and incorporating dialogue features to enhance the state-of-the-art model.
Tasks
Published	2019-07-09
URL	https://arxiv.org/abs/1907.03975v1
PDF	https://arxiv.org/pdf/1907.03975v1.pdf
PWC	https://paperswithcode.com/paper/implicit-discourse-relation-identification
Repo
Framework

Combining learning rate decay and weight decay with complexity gradient descent - Part I


Title	Combining learning rate decay and weight decay with complexity gradient descent - Part I
Authors	Pierre H. Richemond, Yike Guo
Abstract	The role of $L^2$ regularization, in the specific case of deep neural networks rather than more traditional machine learning models, is still not fully elucidated. We hypothesize that this complex interplay is due to the combination of overparameterization and high dimensional phenomena that take place during training and make it unamenable to standard convex optimization methods. Using insights from statistical physics and random fields theory, we introduce a parameter factoring in both the level of the loss function and its remaining nonconvexity: the \emph{complexity}. We proceed to show that it is desirable to proceed with \emph{complexity gradient descent}. We then show how to use this intuition to derive novel and efficient annealing schemes for the strength of $L^2$ regularization when performing standard stochastic gradient descent in deep neural networks.
Tasks
Published	2019-02-07
URL	http://arxiv.org/abs/1902.02881v1
PDF	http://arxiv.org/pdf/1902.02881v1.pdf
PWC	https://paperswithcode.com/paper/combining-learning-rate-decay-and-weight
Repo
Framework

Mastering Complex Control in MOBA Games with Deep Reinforcement Learning


Title	Mastering Complex Control in MOBA Games with Deep Reinforcement Learning
Authors	Deheng Ye, Zhao Liu, Mingfei Sun, Bei Shi, Peilin Zhao, Hao Wu, Hongsheng Yu, Shaojie Yang, Xipeng Wu, Qingwei Guo, Qiaobo Chen, Yinyuting Yin, Hao Zhang, Tengfei Shi, Liang Wang, Qiang Fu, Wei Yang, Lanxiao Huang
Abstract	We study the reinforcement learning problem of complex action control in the Multi-player Online Battle Arena (MOBA) 1v1 games. This problem involves far more complicated state and action spaces than those of traditional 1v1 games, such as Go and Atari series, which makes it very difficult to search any policies with human-level performance. In this paper, we present a deep reinforcement learning framework to tackle this problem from the perspectives of both system and algorithm. Our system is of low coupling and high scalability, which enables efficient explorations at large scale. Our algorithm includes several novel strategies, including control dependency decoupling, action mask, target attention, and dual-clip PPO, with which our proposed actor-critic network can be effectively trained in our system. Tested on the MOBA game Honor of Kings, the trained AI agents can defeat top professional human players in full 1v1 games.
Tasks
Published	2019-12-20
URL	https://arxiv.org/abs/1912.09729v2
PDF	https://arxiv.org/pdf/1912.09729v2.pdf
PWC	https://paperswithcode.com/paper/mastering-complex-control-in-moba-games-with
Repo
Framework

DebFace: De-biasing Face Recognition


Title	DebFace: De-biasing Face Recognition
Authors	Sixue Gong, Xiaoming Liu, Anil K. Jain
Abstract	We address the problem of bias in automated face recognition and demographic attribute estimation algorithms, where errors are lower on certain cohorts belonging to specific demographic groups. We present a novel de-biasing adversarial network that learns to extract disentangled feature representations for both unbiased face recognition and demographics estimation. The proposed network consists of one identity classifier and three demographic classifiers (for gender, age, and race) that are trained to distinguish identity and demographic attributes, respectively. Adversarial learning is adopted to minimize correlation among feature factors so as to abate bias influence from other factors. We also design a new scheme to combine demographics with identity features to strengthen robustness of face representation in different demographic groups. The experimental results show that our approach is able to reduce bias in face recognition as well as demographics estimation while achieving state-of-the-art performance.
Tasks	Face Recognition
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08080v2
PDF	https://arxiv.org/pdf/1911.08080v2.pdf
PWC	https://paperswithcode.com/paper/debface-de-biasing-face-recognition
Repo
Framework

Neural Models of the Psychosemantics of `Most’


Title	Neural Models of the Psychosemantics of `Most’ \|
Authors	Lewis O’Sullivan, Shane Steinert-Threlkeld
Abstract	How are the meanings of linguistic expressions related to their use in concrete cognitive tasks? Visual identification tasks show human speakers can exhibit considerable variation in their understanding, representation and verification of certain quantifiers. This paper initiates an investigation into neural models of these psycho-semantic tasks. We trained two types of network – a convolutional neural network (CNN) model and a recurrent model of visual attention (RAM) – on the “most” verification task from \citet{Pietroski2009}, manipulating the visual scene and novel notions of task duration. Our results qualitatively mirror certain features of human performance (such as sensitivity to the ratio of set sizes, indicating a reliance on approximate number) while differing in interesting ways (such as exhibiting a subtly different pattern for the effect of image type). We conclude by discussing the prospects for using neural models as cognitive models of this and other psychosemantic tasks.
Tasks
Published	2019-04-04
URL	http://arxiv.org/abs/1904.02734v1
PDF	http://arxiv.org/pdf/1904.02734v1.pdf
PWC	https://paperswithcode.com/paper/neural-models-of-the-psychosemantics-of-most
Repo
Framework

Subgraph Networks with Application to Structural Feature Space Expansion


Title	Subgraph Networks with Application to Structural Feature Space Expansion
Authors	Qi Xuan, Jinhuan Wang, Minghao Zhao, Junkun Yuan, Chenbo Fu, Zhongyuan Ruan, Guanrong Chen
Abstract	Real-world networks exhibit prominent hierarchical and modular structures, with various subgraphs as building blocks. Most existing studies simply consider distinct subgraphs as motifs and use only their numbers to characterize the underlying network. Although such statistics can be used to describe a network model, or even to design some network algorithms, the role of subgraphs in such applications can be further explored so as to improve the results. In this paper, the concept of subgraph network (SGN) is introduced and then applied to network models, with algorithms designed for constructing the 1st-order and 2nd-order SGNs, which can be easily extended to build higher-order ones. Furthermore, these SGNs are used to expand the structural feature space of the underlying network, beneficial for network classification. Numerical experiments demonstrate that the network classification model based on the structural features of the original network together with the 1st-order and 2nd-order SGNs always performs the best as compared to the models based only on one or two of such networks. In other words, the structural features of SGNs can complement that of the original network for better network classification, regardless of the feature extraction method used, such as the handcrafted, network embedding and kernel-based methods.
Tasks	Graph Classification, Network Embedding
Published	2019-03-21
URL	https://arxiv.org/abs/1903.09022v3
PDF	https://arxiv.org/pdf/1903.09022v3.pdf
PWC	https://paperswithcode.com/paper/subgraph-networks-with-application-to
Repo
Framework

Induction and Reference of Entities in a Visual Story


Title	Induction and Reference of Entities in a Visual Story
Authors	Ruo-Ping Dong, Khyathi Raghavi Chandu, Alan W Black
Abstract	We are enveloped by stories of visual interpretations in our everyday lives. The way we narrate a story often comprises of two stages, which are, forming a central mind map of entities and then weaving a story around them. A contributing factor to coherence is not just basing the story on these entities but also, referring to them using appropriate terms to avoid repetition. In this paper, we address these two stages of introducing the right entities at seemingly reasonable junctures and also referring them coherently in the context of visual storytelling. The building blocks of the central mind map, also known as entity skeleton are entity chains including nominal and coreference expressions. This entity skeleton is also represented in different levels of abstractions to compose a generalized frame to weave the story. We build upon an encoder-decoder framework to penalize the model when the decoded story does not adhere to this entity skeleton. We establish a strong baseline for skeleton informed generation and then extend this to have the capability of multitasking by predicting the skeleton in addition to generating the story. Finally, we build upon this model and propose a glocal hierarchical attention model that attends to the skeleton both at the sentence (local) and the story (global) levels. We observe that our proposed models outperform the baseline in terms of automatic evaluation metric, METEOR. We perform various analysis targeted to evaluate the performance of our task of enforcing the entity skeleton such as the number and diversity of the entities generated. We also conduct human evaluation from which it is concluded that the visual stories generated by our model are preferred 82% of the times. In addition, we show that our glocal hierarchical attention model improves coherence by introducing more pronouns as required by the presence of nouns.
Tasks	Visual Storytelling
Published	2019-09-15
URL	https://arxiv.org/abs/1909.09699v1
PDF	https://arxiv.org/pdf/1909.09699v1.pdf
PWC	https://paperswithcode.com/paper/induction-and-reference-of-entities-in-a
Repo
Framework