Paper Group ANR 892
CNNs found to jump around more skillfully than RNNs: Compositional generalization in seq2seq convolutional networks. Adaptive Exploration in Linear Contextual Bandit. Multi-Criterion Evolutionary Design of Deep Convolutional Neural Networks. Generalized Separable Nonnegative Matrix Factorization. A Dual Symmetric Gauss-Seidel Alternating Direction …
CNNs found to jump around more skillfully than RNNs: Compositional generalization in seq2seq convolutional networks
Title | CNNs found to jump around more skillfully than RNNs: Compositional generalization in seq2seq convolutional networks |
Authors | Roberto Dessì, Marco Baroni |
Abstract | Lake and Baroni (2018) introduced the SCAN dataset probing the ability of seq2seq models to capture compositional generalizations, such as inferring the meaning of “jump around” 0-shot from the component words. Recurrent networks (RNNs) were found to completely fail the most challenging generalization cases. We test here a convolutional network (CNN) on these tasks, reporting hugely improved performance with respect to RNNs. Despite the big improvement, the CNN has however not induced systematic rules, suggesting that the difference between compositional and non-compositional behaviour is not clear-cut. |
Tasks | |
Published | 2019-05-21 |
URL | https://arxiv.org/abs/1905.08527v1 |
https://arxiv.org/pdf/1905.08527v1.pdf | |
PWC | https://paperswithcode.com/paper/cnns-found-to-jump-around-more-skillfully |
Repo | |
Framework | |
Adaptive Exploration in Linear Contextual Bandit
Title | Adaptive Exploration in Linear Contextual Bandit |
Authors | Botao Hao, Tor Lattimore, Csaba Szepesvari |
Abstract | Contextual bandits serve as a fundamental model for many sequential decision making tasks. The most popular theoretically justified approaches are based on the optimism principle. While these algorithms can be practical, they are known to be suboptimal asymptotically. On the other hand, existing asymptotically optimal algorithms for this problem do not exploit the linear structure in an optimal way and suffer from lower-order terms that dominate the regret in all practically interesting regimes. We start to bridge the gap by designing an algorithm that is asymptotically optimal and has good finite-time empirical performance. At the same time, we make connections to the recent literature on when exploration-free methods are effective. Indeed, if the distribution of contexts is well behaved, then our algorithm acts mostly greedily and enjoys sub-logarithmic regret. Furthermore, our approach is adaptive in the sense that it automatically detects the nice case. Numerical results demonstrate significant regret reductions by our method relative to several baselines. |
Tasks | Decision Making, Multi-Armed Bandits |
Published | 2019-10-15 |
URL | https://arxiv.org/abs/1910.06996v2 |
https://arxiv.org/pdf/1910.06996v2.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-exploration-in-linear-contextual |
Repo | |
Framework | |
Multi-Criterion Evolutionary Design of Deep Convolutional Neural Networks
Title | Multi-Criterion Evolutionary Design of Deep Convolutional Neural Networks |
Authors | Zhichao Lu, Ian Whalen, Yashesh Dhebar, Kalyanmoy Deb, Erik Goodman, Wolfgang Banzhaf, Vishnu Naresh Boddeti |
Abstract | Convolutional neural networks (CNNs) are the backbones of deep learning paradigms for numerous vision tasks. Early advancements in CNN architectures are primarily driven by human expertise and elaborate design. Recently, neural architecture search was proposed with the aim of automating the network design process and generating task-dependent architectures. While existing approaches have achieved competitive performance in image classification, they are not well suited under limited computational budget for two reasons: (1) the obtained architectures are either solely optimized for classification performance or only for one targeted resource requirement; (2) the search process requires vast computational resources in most approaches. To overcome this limitation, we propose an evolutionary algorithm for searching neural architectures under multiple objectives, such as classification performance and FLOPs. The proposed method addresses the first shortcoming by populating a set of architectures to approximate the entire Pareto frontier through genetic operations that recombine and modify architectural components progressively. Our approach improves the computation efficiency by carefully down-scaling the architectures during the search as well as reinforcing the patterns commonly shared among the past successful architectures through Bayesian Learning. The integration of these two main contributions allows an efficient design of architectures that are competitive and in many cases outperform both manually and automatically designed architectures on benchmark image classification datasets, CIFAR, ImageNet and human chest X-ray. The flexibility provided from simultaneously obtaining multiple architecture choices for different compute requirements further differentiates our approach from other methods in the literature. |
Tasks | Image Classification, Neural Architecture Search |
Published | 2019-12-03 |
URL | https://arxiv.org/abs/1912.01369v1 |
https://arxiv.org/pdf/1912.01369v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-criterion-evolutionary-design-of-deep |
Repo | |
Framework | |
Generalized Separable Nonnegative Matrix Factorization
Title | Generalized Separable Nonnegative Matrix Factorization |
Authors | Junjun Pan, Nicolas Gillis |
Abstract | Nonnegative matrix factorization (NMF) is a linear dimensionality technique for nonnegative data with applications such as image analysis, text mining, audio source separation and hyperspectral unmixing. Given a data matrix $M$ and a factorization rank $r$, NMF looks for a nonnegative matrix $W$ with $r$ columns and a nonnegative matrix $H$ with $r$ rows such that $M \approx WH$. NMF is NP-hard to solve in general. However, it can be computed efficiently under the separability assumption which requires that the basis vectors appear as data points, that is, that there exists an index set $\mathcal{K}$ such that $W = M(:,\mathcal{K})$. In this paper, we generalize the separability assumption: We only require that for each rank-one factor $W(:,k)H(k,:)$ for $k=1,2,\dots,r$, either $W(:,k) = M(:,j)$ for some $j$ or $H(k,:) = M(i,:)$ for some $i$. We refer to the corresponding problem as generalized separable NMF (GS-NMF). We discuss some properties of GS-NMF and propose a convex optimization model which we solve using a fast gradient method. We also propose a heuristic algorithm inspired by the successive projection algorithm. To verify the effectiveness of our methods, we compare them with several state-of-the-art separable NMF algorithms on synthetic, document and image data sets. |
Tasks | Hyperspectral Unmixing |
Published | 2019-05-30 |
URL | https://arxiv.org/abs/1905.12995v2 |
https://arxiv.org/pdf/1905.12995v2.pdf | |
PWC | https://paperswithcode.com/paper/generalized-separable-nonnegative-matrix |
Repo | |
Framework | |
A Dual Symmetric Gauss-Seidel Alternating Direction Method of Multipliers for Hyperspectral Sparse Unmixing
Title | A Dual Symmetric Gauss-Seidel Alternating Direction Method of Multipliers for Hyperspectral Sparse Unmixing |
Authors | Longfei Ren, Chengjing Wang, Peipei Tang, Zheng Ma |
Abstract | Since sparse unmixing has emerged as a promising approach to hyperspectral unmixing, some spatial-contextual information in the hyperspectral images has been exploited to improve the performance of the unmixing recently. The total variation (TV) has been widely used to promote the spatial homogeneity as well as the smoothness between adjacent pixels. However, the computation task for hyperspectral sparse unmixing with a TV regularization term is heavy. Besides, the convergences of the traditional sparse unmixing algorithms which are special cases of the primal alternating direction method of multipliers (pADMM) have not been explained in details. In this paper, we design an efficient and convergent dual symmetric Gauss-Seidel ADMM (sGS-ADMM) for hyperspectral sparse unmixing with a TV regularization term. We also present the global convergence and local linear convergence rate analysis for the traditional sparse unmixing algorithm and our algorithm. As demonstrated in numerical experiments, our algorithm can obviously improve the efficiency of the unmixing compared with the state-of-the-art algorithm. More importantly, we can obtain images with higher quality. |
Tasks | Hyperspectral Unmixing |
Published | 2019-02-25 |
URL | http://arxiv.org/abs/1902.09135v1 |
http://arxiv.org/pdf/1902.09135v1.pdf | |
PWC | https://paperswithcode.com/paper/a-dual-symmetric-gauss-seidel-alternating |
Repo | |
Framework | |
Deep ensemble network with explicit complementary model for accuracy-balanced classification
Title | Deep ensemble network with explicit complementary model for accuracy-balanced classification |
Authors | Dohyun Kim, Kyeorye Lee, Jiyeon Kim, Junseok Kwon, Joongheon Kim |
Abstract | The average accuracy is one of major evaluation metrics for classification systems, while the accuracy deviation is another important performance metric used to evaluate various deep neural networks. In this paper, we present a new ensemble-like fast deep neural network, Harmony, that can reduce the accuracy deviation among categories without degrading overall average accuracy. Harmony consists of three sub-models, namely, Target model, Complementary model, and Conductor model. In Harmony, an object is classified by using either Target model or Complementary model. Target model is a conventional classification network for general categories, while Complementary model is a classification network especially for weak categories that are inaccurately classified by Target model. Conductor model is used to select one of two models. Experimental results demonstrate that Harmony accurately classifies categories, while it reduces the accuracy deviation among the categories. |
Tasks | |
Published | 2019-08-10 |
URL | https://arxiv.org/abs/1908.03671v1 |
https://arxiv.org/pdf/1908.03671v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-ensemble-network-with-explicit |
Repo | |
Framework | |
Translator2Vec: Understanding and Representing Human Post-Editors
Title | Translator2Vec: Understanding and Representing Human Post-Editors |
Authors | António Góis, André F. T. Martins |
Abstract | The combination of machines and humans for translation is effective, with many studies showing productivity gains when humans post-edit machine-translated output instead of translating from scratch. To take full advantage of this combination, we need a fine-grained understanding of how human translators work, and which post-editing styles are more effective than others. In this paper, we release and analyze a new dataset with document-level post-editing action sequences, including edit operations from keystrokes, mouse actions, and waiting times. Our dataset comprises 66,268 full document sessions post-edited by 332 humans, the largest of the kind released to date. We show that action sequences are informative enough to identify post-editors accurately, compared to baselines that only look at the initial and final text. We build on this to learn and visualize continuous representations of post-editors, and we show that these representations improve the downstream task of predicting post-editing time. |
Tasks | |
Published | 2019-07-24 |
URL | https://arxiv.org/abs/1907.10362v1 |
https://arxiv.org/pdf/1907.10362v1.pdf | |
PWC | https://paperswithcode.com/paper/translator2vec-understanding-and-representing |
Repo | |
Framework | |
Deep Learning for Inverse Problems: Bounds and Regularizers
Title | Deep Learning for Inverse Problems: Bounds and Regularizers |
Authors | Jaweria Amjad, Zhaoyan Lyu, Miguel R. D. Rodrigues |
Abstract | Inverse problems arise in a number of domains such as medical imaging, remote sensing, and many more, relying on the use of advanced signal and image processing approaches – such as sparsity-driven techniques – to determine their solution. This paper instead studies the use of deep learning approaches to approximate the solution of inverse problems. In particular, the paper provides a new generalization bound, depending on key quantity associated with a deep neural network – its Jacobian matrix – that also leads to a number of computationally efficient regularization strategies applicable to inverse problems. The paper also tests the proposed regularization strategies in a number of inverse problems including image super-resolution ones. Our numerical results conducted on various datasets show that both fully connected and convolutional neural networks regularized using the regularization or proxy regularization strategies originating from our theory exhibit much better performance than deep networks regularized with standard approaches such as weight-decay. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2019-01-31 |
URL | http://arxiv.org/abs/1901.11352v1 |
http://arxiv.org/pdf/1901.11352v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-inverse-problems-bounds-and |
Repo | |
Framework | |
Stable Rank Normalization for Improved Generalization in Neural Networks and GANs
Title | Stable Rank Normalization for Improved Generalization in Neural Networks and GANs |
Authors | Amartya Sanyal, Philip H. S. Torr, Puneet K. Dokania |
Abstract | Exciting new work on the generalization bounds for neural networks (NN) given by Neyshabur et al. , Bartlett et al. closely depend on two parameter-depenedent quantities: the Lipschitz constant upper-bound and the stable rank (a softer version of the rank operator). This leads to an interesting question of whether controlling these quantities might improve the generalization behaviour of NNs. To this end, we propose stable rank normalization (SRN), a novel, optimal, and computationally efficient weight-normalization scheme which minimizes the stable rank of a linear operator. Surprisingly we find that SRN, inspite of being non-convex problem, can be shown to have a unique optimal solution. Moreover, we show that SRN allows control of the data-dependent empirical Lipschitz constant, which in contrast to the Lipschitz upper-bound, reflects the true behaviour of a model on a given dataset. We provide thorough analyses to show that SRN, when applied to the linear layers of a NN for classification, provides striking improvements-11.3% on the generalization gap compared to the standard NN along with significant reduction in memorization. When applied to the discriminator of GANs (called SRN-GAN) it improves Inception, FID, and Neural divergence scores on the CIFAR 10/100 and CelebA datasets, while learning mappings with low empirical Lipschitz constants. |
Tasks | |
Published | 2019-06-11 |
URL | https://arxiv.org/abs/1906.04659v3 |
https://arxiv.org/pdf/1906.04659v3.pdf | |
PWC | https://paperswithcode.com/paper/stable-rank-normalization-for-improved |
Repo | |
Framework | |
Video synthesis of human upper body with realistic face
Title | Video synthesis of human upper body with realistic face |
Authors | Zhaoxiang Liu, Huan Hu, Zipeng Wang, Kai Wang, Jinqiang Bai, Shiguo Lian |
Abstract | This paper presents a generative adversarial learning-based human upper body video synthesis approach to generate an upper body video of target person that is consistent with the body motion, face expression, and pose of the person in source video. We use upper body keypoints, facial action units and poses as intermediate representations between source video and target video. Instead of directly transferring the source video to the target video, we firstly map the source person’s facial action units and poses into the target person’s facial landmarks, then combine the normalized upper body keypoints and generated facial landmarks with spatio-temporal smoothing to generate the corresponding target video’s image. Experimental results demonstrated the effectiveness of our method. |
Tasks | |
Published | 2019-08-19 |
URL | https://arxiv.org/abs/1908.06607v3 |
https://arxiv.org/pdf/1908.06607v3.pdf | |
PWC | https://paperswithcode.com/paper/video-synthesis-of-human-upper-body-with |
Repo | |
Framework | |
Exploiting Cognitive Structure for Adaptive Learning
Title | Exploiting Cognitive Structure for Adaptive Learning |
Authors | Qi Liu, Shiwei Tong, Chuanren Liu, Hongke Zhao, Enhong Chen, Haiping Ma, Shijin Wang |
Abstract | Adaptive learning, also known as adaptive teaching, relies on learning path recommendation, which sequentially recommends personalized learning items (e.g., lectures, exercises) to satisfy the unique needs of each learner. Although it is well known that modeling the cognitive structure including knowledge level of learners and knowledge structure (e.g., the prerequisite relations) of learning items is important for learning path recommendation, existing methods for adaptive learning often separately focus on either knowledge levels of learners or knowledge structure of learning items. To fully exploit the multifaceted cognitive structure for learning path recommendation, we propose a Cognitive Structure Enhanced framework for Adaptive Learning, named CSEAL. By viewing path recommendation as a Markov Decision Process and applying an actor-critic algorithm, CSEAL can sequentially identify the right learning items to different learners. Specifically, we first utilize a recurrent neural network to trace the evolving knowledge levels of learners at each learning step. Then, we design a navigation algorithm on the knowledge structure to ensure the logicality of learning paths, which reduces the search space in the decision process. Finally, the actor-critic algorithm is used to determine what to learn next and whose parameters are dynamically updated along the learning path. Extensive experiments on real-world data demonstrate the effectiveness and robustness of CSEAL. |
Tasks | |
Published | 2019-05-23 |
URL | https://arxiv.org/abs/1905.12470v1 |
https://arxiv.org/pdf/1905.12470v1.pdf | |
PWC | https://paperswithcode.com/paper/190512470 |
Repo | |
Framework | |
Medical Image Super-Resolution Using a Generative Adversarial Network
Title | Medical Image Super-Resolution Using a Generative Adversarial Network |
Authors | Yongpei Zhu, Xuesheng Zhang, Kehong Yuan |
Abstract | During the growing popularity of electronic medical records, electronic medical record (EMR) data has exploded increasingly. It is very meaningful to retrieve high quality EMR in mass data. In this paper, an EMR value network with retrieval function is constructed by taking stroke disease as the research object. It mainly includes: 1) It establishes the electronic medical record database and corresponding stroke knowledge graph. 2) The strategy of similarity measurement is included three parts(patients’ chief complaint, pathology results and medical images). Patients’ chief complaints are text data, mainly describing patients’ symptoms and expressed in words or phrases, and patients’ chief complaints are input in independent tick of various symptoms. The data of the pathology results is a structured and digitized expression, so the input method is the same as the patient’s chief complaint; Image similarity adopts content-based image retrieval(CBIR) technology. 3) The analytic hierarchy process (AHP) is used to establish the weights of the three types of data and then synthesize them into an indicator. The accuracy rate of similarity in top 5 was more than 85% based on EMR database with more 200 stroke records using leave-one-out method. It will be the good tool for assistant diagnosis and doctor training, as good quality records are colleted into the databases, like Doctor Watson, in the future. |
Tasks | Brain Segmentation, Content-Based Image Retrieval, Image Generation, Image Retrieval, Image Super-Resolution, Super-Resolution |
Published | 2019-01-30 |
URL | https://arxiv.org/abs/1902.00369v3 |
https://arxiv.org/pdf/1902.00369v3.pdf | |
PWC | https://paperswithcode.com/paper/the-generation-and-application-of-medical |
Repo | |
Framework | |
Depth-Adaptive Transformer
Title | Depth-Adaptive Transformer |
Authors | Maha Elbayad, Jiatao Gu, Edouard Grave, Michael Auli |
Abstract | State of the art sequence-to-sequence models for large scale tasks perform a fixed number of computations for each input sequence regardless of whether it is easy or hard to process. In this paper, we train Transformer models which can make output predictions at different stages of the network and we investigate different ways to predict how much computation is required for a particular sequence. Unlike dynamic computation in Universal Transformers, which applies the same set of layers iteratively, we apply different layers at every step to adjust both the amount of computation as well as the model capacity. On IWSLT German-English translation our approach matches the accuracy of a well tuned baseline Transformer while using less than a quarter of the decoder layers. |
Tasks | Machine Translation |
Published | 2019-10-22 |
URL | https://arxiv.org/abs/1910.10073v4 |
https://arxiv.org/pdf/1910.10073v4.pdf | |
PWC | https://paperswithcode.com/paper/depth-adaptive-transformer |
Repo | |
Framework | |
GADMM: Fast and Communication Efficient Framework for Distributed Machine Learning
Title | GADMM: Fast and Communication Efficient Framework for Distributed Machine Learning |
Authors | Anis Elgabli, Jihong Park, Amrit S. Bedi, Mehdi Bennis, Vaneet Aggarwal |
Abstract | When the data is distributed across multiple servers, lowering the communication cost between the servers (or workers) while solving the distributed learning problem is an important problem and is the focus of this paper. In particular, we propose a fast, and communication-efficient decentralized framework to solve the distributed machine learning (DML) problem. The proposed algorithm, Group Alternating Direction Method of Multipliers (GADMM) is based on the Alternating Direction Method of Multipliers (ADMM) framework. The key novelty in GADMM is that it solves the problem in a decentralized topology where at most half of the workers are competing for the limited communication resources at any given time. Moreover, each worker exchanges the locally trained model only with two neighboring workers, thereby training a global model with a lower amount of communication overhead in each exchange. We prove that GADMM converges to the optimal solution for convex loss functions, and numerically show that it converges faster and more communication-efficient than the state-of-the-art communication-efficient algorithms such as the Lazily Aggregated Gradient (LAG) and dual averaging, in linear and logistic regression tasks on synthetic and real datasets. Furthermore, we propose Dynamic GADMM (D-GADMM), a variant of GADMM, and prove its convergence under the time-varying network topology of the workers. |
Tasks | |
Published | 2019-08-30 |
URL | https://arxiv.org/abs/1909.00047v3 |
https://arxiv.org/pdf/1909.00047v3.pdf | |
PWC | https://paperswithcode.com/paper/gadmm-fast-and-communication-efficient |
Repo | |
Framework | |
Colorectal Cancer Outcome Prediction from H&E Whole Slide Images using Machine Learning and Automatically Inferred Phenotype Profiles
Title | Colorectal Cancer Outcome Prediction from H&E Whole Slide Images using Machine Learning and Automatically Inferred Phenotype Profiles |
Authors | Xingzhi Yue, Neofytos Dimitriou, Ognjen Arandjelovic |
Abstract | Digital pathology (DP) is a new research area which falls under the broad umbrella of health informatics. Owing to its potential for major public health impact, in recent years DP has been attracting much research attention. Nevertheless, a wide breadth of significant conceptual and technical challenges remain, few of them greater than those encountered in the field of oncology. The automatic analysis of digital pathology slides of cancerous tissues is particularly problematic due to the inherent heterogeneity of the disease, extremely large images, amongst numerous others. In this paper we introduce a novel machine learning based framework for the prediction of colorectal cancer outcome from whole digitized haematoxylin & eosin (H&E) stained histopathology slides. Using a real-world data set we demonstrate the effectiveness of the method and present a detailed analysis of its different elements which corroborate its ability to extract and learn salient, discriminative, and clinically meaningful content. |
Tasks | |
Published | 2019-02-10 |
URL | http://arxiv.org/abs/1902.03582v2 |
http://arxiv.org/pdf/1902.03582v2.pdf | |
PWC | https://paperswithcode.com/paper/colorectal-cancer-outcome-prediction-from-he |
Repo | |
Framework | |