Paper Group ANR 1578
Learning with less data via Weakly Labeled Patch Classification in Digital Pathology. A detailed comparative study of open source deep learning frameworks. Natural Adversarial Sentence Generation with Gradient-based Perturbation. Analogies Explained: Towards Understanding Word Embeddings. Two-Stream Multi-Task Network for Fashion Recognition. A Met …
Learning with less data via Weakly Labeled Patch Classification in Digital Pathology
Title | Learning with less data via Weakly Labeled Patch Classification in Digital Pathology |
Authors | Eu Wern Teh, Graham W. Taylor |
Abstract | In Digital Pathology (DP), labeled data is generally very scarce due to the requirement that medical experts provide annotations. We address this issue by learning transferable features from weakly labeled data, which are collected from various parts of the body and are organized by non-medical experts. In this paper, we show that features learned from such weakly labeled datasets are indeed transferable and allow us to achieve highly competitive patch classification results on the colorectal cancer (CRC) dataset [1] and the PatchCamelyon (PCam) dataset [2] while using an order of magnitude less labeled data. |
Tasks | |
Published | 2019-11-27 |
URL | https://arxiv.org/abs/1911.12425v3 |
https://arxiv.org/pdf/1911.12425v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-with-less-data-via-weakly-labeled |
Repo | |
Framework | |
A detailed comparative study of open source deep learning frameworks
Title | A detailed comparative study of open source deep learning frameworks |
Authors | Ghadeer Al-Bdour, Raffi Al-Qurran, Mahmoud Al-Ayyoub, Ali Shatnawi |
Abstract | Deep Learning (DL) is one of the hottest trends in machine learning as DL approaches produced results superior to the state-of-the-art in problematic areas such as image processing and natural language processing (NLP). To foster the growth of DL, several open source frameworks appeared providing implementations of the most common DL algorithms. These frameworks vary in the algorithms they support and in the quality of their implementations. The purpose of this work is to provide a qualitative and quantitative comparison among three of the most popular and most comprehensive DL frameworks (namely Google’s TensorFlow, University of Montreal’s Theano and Microsoft’s CNTK). The ultimate goal of this work is to help end users make an informed decision about the best DL framework that suits their needs and resources. To ensure that our study is as comprehensive as possible, we conduct several experiments using multiple benchmark datasets from different fields (image processing, NLP, etc.) and measure the performance of the frameworks’ implementations of different DL algorithms. For most of our experiments, we find out that CNTK’s implementations are superior to the other ones under consideration. |
Tasks | |
Published | 2019-02-25 |
URL | http://arxiv.org/abs/1903.00102v1 |
http://arxiv.org/pdf/1903.00102v1.pdf | |
PWC | https://paperswithcode.com/paper/a-detailed-comparative-study-of-open-source |
Repo | |
Framework | |
Natural Adversarial Sentence Generation with Gradient-based Perturbation
Title | Natural Adversarial Sentence Generation with Gradient-based Perturbation |
Authors | Yu-Lun Hsieh, Minhao Cheng, Da-Cheng Juan, Wei Wei, Wen-Lian Hsu, Cho-Jui Hsieh |
Abstract | This work proposes a novel algorithm to generate natural language adversarial input for text classification models, in order to investigate the robustness of these models. It involves applying gradient-based perturbation on the sentence embeddings that are used as the features for the classifier, and learning a decoder for generation. We employ this method to a sentiment analysis model and verify its effectiveness in inducing incorrect predictions by the model. We also conduct quantitative and qualitative analysis on these examples and demonstrate that our approach can generate more natural adversaries. In addition, it can be used to successfully perform black-box attacks, which involves attacking other existing models whose parameters are not known. On a public sentiment analysis API, the proposed method introduces a 20% relative decrease in average accuracy and 74% relative increase in absolute error. |
Tasks | Sentence Embeddings, Sentiment Analysis, Text Classification |
Published | 2019-09-06 |
URL | https://arxiv.org/abs/1909.04495v1 |
https://arxiv.org/pdf/1909.04495v1.pdf | |
PWC | https://paperswithcode.com/paper/natural-adversarial-sentence-generation-with |
Repo | |
Framework | |
Analogies Explained: Towards Understanding Word Embeddings
Title | Analogies Explained: Towards Understanding Word Embeddings |
Authors | Carl Allen, Timothy Hospedales |
Abstract | Word embeddings generated by neural network methods such as word2vec (W2V) are well known to exhibit seemingly linear behaviour, e.g. the embeddings of analogy “woman is to queen as man is to king” approximately describe a parallelogram. This property is particularly intriguing since the embeddings are not trained to achieve it. Several explanations have been proposed, but each introduces assumptions that do not hold in practice. We derive a probabilistically grounded definition of paraphrasing that we re-interpret as word transformation, a mathematical description of “$w_x$ is to $w_y$”. From these concepts we prove existence of linear relationships between W2V-type embeddings that underlie the analogical phenomenon, identifying explicit error terms. |
Tasks | Word Embeddings |
Published | 2019-01-28 |
URL | https://arxiv.org/abs/1901.09813v2 |
https://arxiv.org/pdf/1901.09813v2.pdf | |
PWC | https://paperswithcode.com/paper/analogies-explained-towards-understanding |
Repo | |
Framework | |
Two-Stream Multi-Task Network for Fashion Recognition
Title | Two-Stream Multi-Task Network for Fashion Recognition |
Authors | Peizhao Li, Yanjing Li, Xiaolong Jiang, Xiantong Zhen |
Abstract | In this paper, we present a two-stream multi-task network for fashion recognition. This task is challenging as fashion clothing always contain multiple attributes, which need to be predicted simultaneously for real-time industrial systems. To handle these challenges, we formulate fashion recognition into a multi-task learning problem, including landmark detection, category and attribute classifications, and solve it with the proposed deep convolutional neural network. We design two knowledge sharing strategies which enable information transfer between tasks and improve the overall performance. The proposed model achieves state-of-the-art results on large-scale fashion dataset comparing to the existing methods, which demonstrates its great effectiveness and superiority for fashion recognition. |
Tasks | Multi-Task Learning |
Published | 2019-01-29 |
URL | https://arxiv.org/abs/1901.10172v3 |
https://arxiv.org/pdf/1901.10172v3.pdf | |
PWC | https://paperswithcode.com/paper/two-stream-multi-task-network-for-fashion |
Repo | |
Framework | |
A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech – a Deep Learning approach
Title | A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech – a Deep Learning approach |
Authors | Noé Tits |
Abstract | In this project, we aim to build a Text-to-Speech system able to produce speech with a controllable emotional expressiveness. We propose a methodology for solving this problem in three main steps. The first is the collection of emotional speech data. We discuss the various formats of existing datasets and their usability in speech generation. The second step is the development of a system to automatically annotate data with emotion/expressiveness features. We compare several techniques using transfer learning to extract such a representation through other tasks and propose a method to visualize and interpret the correlation between vocal and emotional features. The third step is the development of a deep learning-based system taking text and emotion/expressiveness as input and producing speech as output. We study the impact of fine tuning from a neutral TTS towards an emotional TTS in terms of intelligibility and perception of the emotion. |
Tasks | Transfer Learning |
Published | 2019-07-05 |
URL | https://arxiv.org/abs/1907.02784v1 |
https://arxiv.org/pdf/1907.02784v1.pdf | |
PWC | https://paperswithcode.com/paper/a-methodology-for-controlling-the-emotional |
Repo | |
Framework | |
Computing Equilibria in Binary Networked Public Goods Games
Title | Computing Equilibria in Binary Networked Public Goods Games |
Authors | Sixie Yu, Kai Zhou, P. Jeffrey Brantingham, Yevgeniy Vorobeychik |
Abstract | Public goods games study the incentives of individuals to contribute to a public good and their behaviors in equilibria. In this paper, we examine a specific type of public goods game where players are networked and each has binary actions, and focus on the algorithmic aspects of such games. First, we show that checking the existence of a pure-strategy Nash equilibrium is NP-Complete. We then identify tractable instances based on restrictions of either utility functions or of the underlying graphical structure. In certain cases, we also show that we can efficiently compute a socially optimal Nash equilibrium. Finally, we propose a heuristic approach for computing approximate equilibria in general binary networked public goods games, and experimentally demonstrate its effectiveness. |
Tasks | |
Published | 2019-11-13 |
URL | https://arxiv.org/abs/1911.05788v1 |
https://arxiv.org/pdf/1911.05788v1.pdf | |
PWC | https://paperswithcode.com/paper/computing-equilibria-in-binary-networked |
Repo | |
Framework | |
Neural network identifiability for a family of sigmoidal nonlinearities
Title | Neural network identifiability for a family of sigmoidal nonlinearities |
Authors | Verner Vlačić, Helmut Bölcskei |
Abstract | This paper addresses the following question of neural network identifiability: Does the input-output map realized by a feed-forward neural network with respect to a given nonlinearity uniquely specify the network architecture, weights, and biases? Existing literature on the subject Sussman 1992, Albertini, Sontag et al. 1993, Fefferman 1994 suggests that the answer should be yes, up to certain symmetries induced by the nonlinearity, and provided the networks under consideration satisfy certain “genericity conditions”. The results in Sussman 1992 and Albertini, Sontag et al. 1993 apply to networks with a single hidden layer and in Fefferman 1994 the networks need to be fully connected. In an effort to answer the identifiability question in greater generality, we derive necessary genericity conditions for the identifiability of neural networks of arbitrary depth and connectivity with an arbitrary nonlinearity. Moreover, we construct a family of nonlinearities for which these genericity conditions are minimal, i.e., both necessary and sufficient. This family is large enough to approximate many commonly encountered nonlinearities to arbitrary precision in the uniform norm. |
Tasks | |
Published | 2019-06-11 |
URL | https://arxiv.org/abs/1906.06994v2 |
https://arxiv.org/pdf/1906.06994v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-network-identifiability-for-a-family |
Repo | |
Framework | |
A neural network based post-filter for speech-driven head motion synthesis
Title | A neural network based post-filter for speech-driven head motion synthesis |
Authors | JinHong Lu, Hiroshi Shimodaira |
Abstract | Despite the fact that neural networks are widely used for speech-driven head motion synthesis, it is well-known that the output of neural networks is noisy or discontinuous due to the limited capability of deep neural networks in predicting human motion. Thus, post-processing is required to obtain smooth head motion trajectories for animation. It is common to apply a linear filter or consider keyframes as post-processing. However, neither approach is optimal as there is always a trade-off between smoothness and accuracy. We propose to employ a neural network trained in a way that it is capable of reconstructing the head motions, in order to overcome this limitation. In the objective evaluation, this filter is proved to be good at de-noising data involving types of noise (dropout or Gaussian noise). Objective metrics also demonstrate the improvement of the joined head motion’s smoothness after being processed by our proposed filter. A detailed analysis reveals that our proposed filter learns the characteristic of head motions. The subjective evaluation shows that participants were unable to distinguish the synthesised head motions with our proposed filter from ground truth, which was preferred over the Gaussian filter and moving average. |
Tasks | |
Published | 2019-07-24 |
URL | https://arxiv.org/abs/1907.10585v2 |
https://arxiv.org/pdf/1907.10585v2.pdf | |
PWC | https://paperswithcode.com/paper/a-neural-network-based-post-filter-for-speech |
Repo | |
Framework | |
Towards Neural Language Evaluators
Title | Towards Neural Language Evaluators |
Authors | Hassan Kané, Yusuf Kocyigit, Pelkins Ajanoh, Ali Abdalla, Mohamed Coulibali |
Abstract | We review three limitations of BLEU and ROUGE – the most popular metrics used to assess reference summaries against hypothesis summaries, come up with criteria for what a good metric should behave like and propose concrete ways to use recent Transformers-based Language Models to assess reference summaries against hypothesis summaries. |
Tasks | |
Published | 2019-09-20 |
URL | https://arxiv.org/abs/1909.09268v2 |
https://arxiv.org/pdf/1909.09268v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-neural-language-evaluators |
Repo | |
Framework | |
Deep Collaborative Discrete Hashing with Semantic-Invariant Structure
Title | Deep Collaborative Discrete Hashing with Semantic-Invariant Structure |
Authors | Zijian Wang, Zheng Zhang, Yadan Luo, Zi Huang |
Abstract | Existing deep hashing approaches fail to fully explore semantic correlations and neglect the effect of linguistic context on visual attention learning, leading to inferior performance. This paper proposes a dual-stream learning framework, dubbed Deep Collaborative Discrete Hashing (DCDH), which constructs a discriminative common discrete space by collaboratively incorporating the shared and individual semantics deduced from visual features and semantic labels. Specifically, the context-aware representations are generated by employing the outer product of visual embeddings and semantic encodings. Moreover, we reconstruct the labels and introduce the focal loss to take advantage of frequent and rare concepts. The common binary code space is built on the joint learning of the visual representations attended by language, the semantic-invariant structure construction and the label distribution correction. Extensive experiments demonstrate the superiority of our method. |
Tasks | |
Published | 2019-11-05 |
URL | https://arxiv.org/abs/1911.01565v1 |
https://arxiv.org/pdf/1911.01565v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-collaborative-discrete-hashing-with |
Repo | |
Framework | |
Improving Prognostic Performance in Resectable Pancreatic Ductal Adenocarcinoma using Radiomics and Deep Learning Features Fusion in CT Images
Title | Improving Prognostic Performance in Resectable Pancreatic Ductal Adenocarcinoma using Radiomics and Deep Learning Features Fusion in CT Images |
Authors | Yucheng Zhang, Edrise M. Lobo-Mueller, Paul Karanicolas, Steven Gallinger, Masoom A. Haider, Farzad Khalvati |
Abstract | As an analytic pipeline for quantitative imaging feature extraction and analysis, radiomics has grown rapidly in the past a few years. Recent studies in radiomics aim to investigate the relationship between tumors imaging features and clinical outcomes. Open source radiomics feature banks enable the extraction and analysis of thousands of predefined features. On the other hand, recent advances in deep learning have shown significant potential in the quantitative medical imaging field, raising the research question of whether predefined radiomics features have predictive information in addition to deep learning features. In this study, we propose a feature fusion method and investigate whether a combined feature bank of deep learning and predefined radiomics features can improve the prognostics performance. CT images from resectable Pancreatic Adenocarcinoma (PDAC) patients were used to compare the prognosis performance of common feature reduction and fusion methods and the proposed risk-score based feature fusion method for overall survival. It was shown that the proposed feature fusion method significantly improves the prognosis performance for overall survival in resectable PDAC cohorts, elevating the area under ROC curve by 51% compared to predefined radiomics features alone, by 16% compared to deep learning features alone, and by 32% compared to existing feature fusion and reduction methods for a combination of deep learning and predefined radiomics features. |
Tasks | |
Published | 2019-07-10 |
URL | https://arxiv.org/abs/1907.04822v1 |
https://arxiv.org/pdf/1907.04822v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-prognostic-performance-in |
Repo | |
Framework | |
Image based cellular contractile force evaluation with small-world network inspired CNN: SW-UNet
Title | Image based cellular contractile force evaluation with small-world network inspired CNN: SW-UNet |
Authors | Li Honghan, Daiki Matsunaga, Tsubasa S. Matsui, Hiroki Aosaki, Shinji Deguchi |
Abstract | We propose an image-based cellular contractile force evaluation method using a machine learning technique. We use a special substrate that exhibits wrinkles when cells grab the substrate and contract, and the wrinkles can be used to visualize the force magnitude and direction. In order to extract wrinkles from the microscope images, we develop a new CNN (convolutional neural network) architecture SW-UNet (small-world U-Net), which is a CNN that reflects the concept of the small-world network. The SW-UNet shows better performance in wrinkle segmentation task compared to other methods: the error (Euclidean distance) of SW-UNet is 4.9 times smaller than 2D-FFT (fast Fourier transform) based segmentation approach, and is 2.9 times smaller than U-Net. As a demonstration, we compare the contractile force of U2OS (human osteosarcoma) cells and show that cells with a mutation in the KRAS oncogne show larger force compared to the wild-type cells. Our new machine learning based algorithm provides us an efficient, automated and accurate method to evaluate the cell contractile force. |
Tasks | |
Published | 2019-08-23 |
URL | https://arxiv.org/abs/1908.08631v1 |
https://arxiv.org/pdf/1908.08631v1.pdf | |
PWC | https://paperswithcode.com/paper/image-based-cellular-contractile-force |
Repo | |
Framework | |
HexaShrink, an exact scalable framework for hexahedral meshes with attributes and discontinuities: multiresolution rendering and storage of geoscience models
Title | HexaShrink, an exact scalable framework for hexahedral meshes with attributes and discontinuities: multiresolution rendering and storage of geoscience models |
Authors | Jean-Luc Peyrot, Laurent Duval, Frédéric Payan, Lauriane Bouard, Lénaïc Chizat, Sébastien Schneider, Marc Antonini |
Abstract | With huge data acquisition progresses realized in the past decades and acquisition systems now able to produce high resolution grids and point clouds, the digitization of physical terrains becomes increasingly more precise. Such extreme quantities of generated and modeled data greatly impact computational performances on many levels of high-performance computing (HPC): storage media, memory requirements, transfer capability, and finally simulation interactivity, necessary to exploit this instance of big data. Efficient representations and storage are thus becoming “enabling technologies’’ in HPC experimental and simulation science. We propose HexaShrink, an original decomposition scheme for structured hexahedral volume meshes. The latter are used for instance in biomedical engineering, materials science, or geosciences. HexaShrink provides a comprehensive framework allowing efficient mesh visualization and storage. Its exactly reversible multiresolution decomposition yields a hierarchy of meshes of increasing levels of details, in terms of either geometry, continuous or categorical properties of cells. Starting with an overview of volume meshes compression techniques, our contribution blends coherently different multiresolution wavelet schemes in different dimensions. It results in a global framework preserving discontinuities (faults) across scales, implemented as a fully reversible upscaling at different resolutions. Experimental results are provided on meshes of varying size and complexity. They emphasize the consistency of the proposed representation, in terms of visualization, attribute downsampling and distribution at different resolutions. Finally, HexaShrink yields gains in storage space when combined to lossless compression techniques. |
Tasks | |
Published | 2019-03-16 |
URL | https://arxiv.org/abs/1903.07614v2 |
https://arxiv.org/pdf/1903.07614v2.pdf | |
PWC | https://paperswithcode.com/paper/hexashrink-an-exact-scalable-framework-for |
Repo | |
Framework | |
Conv-codes: Audio Hashing For Bird Species Classification
Title | Conv-codes: Audio Hashing For Bird Species Classification |
Authors | Anshul Thakur, Pulkit Sharma, Vinayak Abrol, Padmanabhan Rajan |
Abstract | In this work, we propose a supervised, convex representation based audio hashing framework for bird species classification. The proposed framework utilizes archetypal analysis, a matrix factorization technique, to obtain convex-sparse representations of a bird vocalization. These convex representations are hashed using Bloom filters with non-cryptographic hash functions to obtain compact binary codes, designated as conv-codes. The conv-codes extracted from the training examples are clustered using class-specific k-medoids clustering with Jaccard coefficient as the similarity metric. A hash table is populated using the cluster centers as keys while hash values/slots are pointers to the species identification information. During testing, the hash table is searched to find the species information corresponding to a cluster center that exhibits maximum similarity with the test conv-code. Hence, the proposed framework classifies a bird vocalization in the conv-code space and requires no explicit classifier or reconstruction error calculations. Apart from that, based on min-hash and direct addressing, we also propose a variant of the proposed framework that provides faster and effective classification. The performances of both these frameworks are compared with existing bird species classification frameworks on the audio recordings of 50 different bird species. |
Tasks | |
Published | 2019-02-07 |
URL | http://arxiv.org/abs/1902.02498v1 |
http://arxiv.org/pdf/1902.02498v1.pdf | |
PWC | https://paperswithcode.com/paper/conv-codes-audio-hashing-for-bird-species |
Repo | |
Framework | |