Paper Group ANR 538
Quantum Speedup in Adaptive Boosting of Binary Classification. Dying ReLU and Initialization: Theory and Numerical Examples. Graph Embedding VAE: A Permutation Invariant Model of Graph Structure. Learning event representations in image sequences by dynamic graph embedding. Representation Learning of EHR Data via Graph-Based Medical Entity Embedding …
Quantum Speedup in Adaptive Boosting of Binary Classification
Title | Quantum Speedup in Adaptive Boosting of Binary Classification |
Authors | Ximing Wang, Yuechi Ma, Min-Hsiu Hsieh, Manhong Yung |
Abstract | In classical machine learning, a set of weak classifiers can be adaptively combined to form a strong classifier for improving the overall performance, a technique called adaptive boosting (or AdaBoost). However, constructing the strong classifier for a large data set is typically resource consuming. Here we propose a quantum extension of AdaBoost, demonstrating a quantum algorithm that can output the optimal strong classifier with a quadratic speedup in the number of queries of the weak classifiers. Our results also include a generalization of the standard AdaBoost to the cases where the output of each classifier may be probabilistic even for the same input. We prove that the update rules and the query complexity of the non-deterministic classifiers are the same as those of deterministic classifiers, which may be of independent interest to the classical machine-learning community. Furthermore, the AdaBoost algorithm can also be applied to data encoded in the form of quantum states; we show how the training set can be simplified by using the tools of t-design. Our approach describes a model of quantum machine learning where quantum speedup is achieved in finding the optimal classifier, which can then be applied for classical machine-learning applications. |
Tasks | Quantum Machine Learning |
Published | 2019-02-03 |
URL | http://arxiv.org/abs/1902.00869v1 |
http://arxiv.org/pdf/1902.00869v1.pdf | |
PWC | https://paperswithcode.com/paper/quantum-speedup-in-adaptive-boosting-of |
Repo | |
Framework | |
Dying ReLU and Initialization: Theory and Numerical Examples
Title | Dying ReLU and Initialization: Theory and Numerical Examples |
Authors | Lu Lu, Yeonjong Shin, Yanhui Su, George Em Karniadakis |
Abstract | The dying ReLU refers to the problem when ReLU neurons become inactive and only output 0 for any input. There are many empirical and heuristic explanations of why ReLU neurons die. However, little is known about its theoretical analysis. In this paper, we rigorously prove that a deep ReLU network will eventually die in probability as the depth goes to infinite. Several methods have been proposed to alleviate the dying ReLU. Perhaps, one of the simplest treatments is to modify the initialization procedure. One common way of initializing weights and biases uses symmetric probability distributions, which suffers from the dying ReLU. We thus propose a new initialization procedure, namely, a randomized asymmetric initialization. We prove that the new initialization can effectively prevent the dying ReLU. All parameters required for the new initialization are theoretically designed. Numerical examples are provided to demonstrate the effectiveness of the new initialization procedure. |
Tasks | |
Published | 2019-03-15 |
URL | https://arxiv.org/abs/1903.06733v2 |
https://arxiv.org/pdf/1903.06733v2.pdf | |
PWC | https://paperswithcode.com/paper/dying-relu-and-initialization-theory-and |
Repo | |
Framework | |
Graph Embedding VAE: A Permutation Invariant Model of Graph Structure
Title | Graph Embedding VAE: A Permutation Invariant Model of Graph Structure |
Authors | Tony Duan, Juho Lee |
Abstract | Generative models of graph structure have applications in biology and social sciences. The state of the art is GraphRNN, which decomposes the graph generation process into a series of sequential steps. While effective for modest sizes, it loses its permutation invariance for larger graphs. Instead, we present a permutation invariant latent-variable generative model relying on graph embeddings to encode structure. Using tools from the random graph literature, our model is highly scalable to large graphs with likelihood evaluation and generation in $O(V + E)$. |
Tasks | Graph Embedding, Graph Generation |
Published | 2019-10-17 |
URL | https://arxiv.org/abs/1910.08057v1 |
https://arxiv.org/pdf/1910.08057v1.pdf | |
PWC | https://paperswithcode.com/paper/graph-embedding-vae-a-permutation-invariant |
Repo | |
Framework | |
Learning event representations in image sequences by dynamic graph embedding
Title | Learning event representations in image sequences by dynamic graph embedding |
Authors | Mariella Dimiccoli, Herwig Wendt |
Abstract | Recently, self-supervised learning has proved to be effective to learn representations of events in image sequences, where events are understood as sets of temporally adjacent images that are semantically perceived as a whole. However, although this approach does not require expensive manual annotations, it is data hungry and suffers from domain adaptation problems. As an alternative, in this work, we propose a novel approach for learning event representations named Dynamic Graph Embedding (DGE). The assumption underlying our model is that a sequence of images can be represented by a graph that encodes both semantic and temporal similarity. The key novelty of DGE is to learn jointly the graph and its graph embedding. At its core, DGE works by iterating over two steps: 1) updating the graph representing the semantic and temporal structure of the data based on the current data representation, and 2) updating the data representation to take into account the current data graph structure. The main advantage of DGE over state-of-the-art self-supervised approaches is that it does not require any training set, but instead learns iteratively from the data itself a low-dimensional embedding that reflects their temporal and semantic structure. Experimental results on two benchmark datasets of real image sequences captured at regular intervals demonstrate that the proposed DGE leads to effective event representations. In particular, it achieves robust temporal segmentation on the EDUBSeg and EDUBSeg-Desc benchmark datasets, outperforming the state of the art. |
Tasks | Domain Adaptation, Graph Embedding |
Published | 2019-10-08 |
URL | https://arxiv.org/abs/1910.03483v1 |
https://arxiv.org/pdf/1910.03483v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-event-representations-in-image |
Repo | |
Framework | |
Representation Learning of EHR Data via Graph-Based Medical Entity Embedding
Title | Representation Learning of EHR Data via Graph-Based Medical Entity Embedding |
Authors | Tong Wu, Yunlong Wang, Yue Wang, Emily Zhao, Yilian Yuan, Zhi Yang |
Abstract | Automatic representation learning of key entities in electronic health record (EHR) data is a critical step for healthcare informatics that turns heterogeneous medical records into structured and actionable information. Here we propose ME2Vec, an algorithmic framework for learning low-dimensional vectors of the most common entities in EHR: medical services, doctors, and patients. ME2Vec leverages diverse graph embedding techniques to cater for the unique characteristic of each medical entity. Using real-world clinical data, we demonstrate the efficacy of ME2Vec over competitive baselines on disease diagnosis prediction. |
Tasks | Graph Embedding, Representation Learning |
Published | 2019-10-07 |
URL | https://arxiv.org/abs/1910.02574v1 |
https://arxiv.org/pdf/1910.02574v1.pdf | |
PWC | https://paperswithcode.com/paper/representation-learning-of-ehr-data-via-graph |
Repo | |
Framework | |
SFSegNet: Parse Freehand Sketches using Deep Fully Convolutional Networks
Title | SFSegNet: Parse Freehand Sketches using Deep Fully Convolutional Networks |
Authors | Junkun Jiang, Ruomei Wang, Shujin Lin, Fei Wang |
Abstract | Parsing sketches via semantic segmentation is attractive but challenging, because (i) free-hand drawings are abstract with large variances in depicting objects due to different drawing styles and skills; (ii) distorting lines drawn on the touchpad make sketches more difficult to be recognized; (iii) the high-performance image segmentation via deep learning technologies needs enormous annotated sketch datasets during the training stage. In this paper, we propose a Sketch-target deep FCN Segmentation Network(SFSegNet) for automatic free-hand sketch segmentation, labeling each sketch in a single object with multiple parts. SFSegNet has an end-to-end network process between the input sketches and the segmentation results, composed of 2 parts: (i) a modified deep Fully Convolutional Network(FCN) using a reweighting strategy to ignore background pixels and classify which part each pixel belongs to; (ii) affine transform encoders that attempt to canonicalize the shaking strokes. We train our network with the dataset that consists of 10,000 annotated sketches, to find an extensively applicable model to segment stokes semantically in one ground truth. Extensive experiments are carried out and segmentation results show that our method outperforms other state-of-the-art networks. |
Tasks | Semantic Segmentation |
Published | 2019-08-15 |
URL | https://arxiv.org/abs/1908.05389v1 |
https://arxiv.org/pdf/1908.05389v1.pdf | |
PWC | https://paperswithcode.com/paper/sfsegnet-parse-freehand-sketches-using-deep |
Repo | |
Framework | |
Mean-field Analysis of Batch Normalization
Title | Mean-field Analysis of Batch Normalization |
Authors | Mingwei Wei, James Stokes, David J Schwab |
Abstract | Batch Normalization (BatchNorm) is an extremely useful component of modern neural network architectures, enabling optimization using higher learning rates and achieving faster convergence. In this paper, we use mean-field theory to analytically quantify the impact of BatchNorm on the geometry of the loss landscape for multi-layer networks consisting of fully-connected and convolutional layers. We show that it has a flattening effect on the loss landscape, as quantified by the maximum eigenvalue of the Fisher Information Matrix. These findings are then used to justify the use of larger learning rates for networks that use BatchNorm, and we provide quantitative characterization of the maximal allowable learning rate to ensure convergence. Experiments support our theoretically predicted maximum learning rate, and furthermore suggest that networks with smaller values of the BatchNorm parameter achieve lower loss after the same number of epochs of training. |
Tasks | |
Published | 2019-03-06 |
URL | http://arxiv.org/abs/1903.02606v1 |
http://arxiv.org/pdf/1903.02606v1.pdf | |
PWC | https://paperswithcode.com/paper/mean-field-analysis-of-batch-normalization |
Repo | |
Framework | |
Hybrid Neural Models For Sequence Modelling: The Best Of Three Worlds
Title | Hybrid Neural Models For Sequence Modelling: The Best Of Three Worlds |
Authors | Marco Dinarelli, Loïc Grobol |
Abstract | We propose a neural architecture with the main characteristics of the most successful neural models of the last years: bidirectional RNNs, encoder-decoder, and the Transformer model. Evaluation on three sequence labelling tasks yields results that are close to the state-of-the-art for all tasks and better than it for some of them, showing the pertinence of this hybrid architecture for this kind of tasks. |
Tasks | |
Published | 2019-09-16 |
URL | https://arxiv.org/abs/1909.07102v1 |
https://arxiv.org/pdf/1909.07102v1.pdf | |
PWC | https://paperswithcode.com/paper/hybrid-neural-models-for-sequence-modelling |
Repo | |
Framework | |
RoPAD: Robust Presentation Attack Detection through Unsupervised Adversarial Invariance
Title | RoPAD: Robust Presentation Attack Detection through Unsupervised Adversarial Invariance |
Authors | Ayush Jaiswal, Shuai Xia, Iacopo Masi, Wael AbdAlmageed |
Abstract | For enterprise, personal and societal applications, there is now an increasing demand for automated authentication of identity from images using computer vision. However, current authentication technologies are still vulnerable to presentation attacks. We present RoPAD, an end-to-end deep learning model for presentation attack detection that employs unsupervised adversarial invariance to ignore visual distractors in images for increased robustness and reduced overfitting. Experiments show that the proposed framework exhibits state-of-the-art performance on presentation attack detection on several benchmark datasets. |
Tasks | |
Published | 2019-03-08 |
URL | http://arxiv.org/abs/1903.03691v2 |
http://arxiv.org/pdf/1903.03691v2.pdf | |
PWC | https://paperswithcode.com/paper/ropad-robust-presentation-attack-detection |
Repo | |
Framework | |
Informative Image Captioning with External Sources of Information
Title | Informative Image Captioning with External Sources of Information |
Authors | Sanqiang Zhao, Piyush Sharma, Tomer Levinboim, Radu Soricut |
Abstract | An image caption should fluently present the essential information in a given image, including informative, fine-grained entity mentions and the manner in which these entities interact. However, current captioning models are usually trained to generate captions that only contain common object names, thus falling short on an important “informativeness” dimension. We present a mechanism for integrating image information together with fine-grained labels (assumed to be generated by some upstream models) into a caption that describes the image in a fluent and informative manner. We introduce a multimodal, multi-encoder model based on Transformer that ingests both image features and multiple sources of entity labels. We demonstrate that we can learn to control the appearance of these entity labels in the output, resulting in captions that are both fluent and informative. |
Tasks | Image Captioning |
Published | 2019-06-20 |
URL | https://arxiv.org/abs/1906.08876v1 |
https://arxiv.org/pdf/1906.08876v1.pdf | |
PWC | https://paperswithcode.com/paper/informative-image-captioning-with-external |
Repo | |
Framework | |
A Neural Rendering Framework for Free-Viewpoint Relighting
Title | A Neural Rendering Framework for Free-Viewpoint Relighting |
Authors | Zhang Chen, Anpei Chen, Guli Zhang, Chengyuan Wang, Yu Ji, Kiriakos N. Kutulakos, Jingyi Yu |
Abstract | We present a novel Relightable Neural Renderer (RNR) for simultaneous view synthesis and relighting using multi-view image inputs. Existing neural rendering (NR) does not explicitly model the physical rendering process and hence has limited capabilities on relighting. RNR instead models image formation in terms of environment lighting, object intrinsic attributes, and the light transport function (LTF), each corresponding to a learnable component. In particular, the incorporation of a physically based rendering process not only enables relighting but also improves the quality of novel view synthesis. Comprehensive experiments on synthetic and real data show that RNR provides a practical and effective solution for conducting free-viewpoint relighting. |
Tasks | Novel View Synthesis |
Published | 2019-11-26 |
URL | https://arxiv.org/abs/1911.11530v1 |
https://arxiv.org/pdf/1911.11530v1.pdf | |
PWC | https://paperswithcode.com/paper/a-neural-rendering-framework-for-free |
Repo | |
Framework | |
Enumerative Data Compression with Non-Uniquely Decodable Codes
Title | Enumerative Data Compression with Non-Uniquely Decodable Codes |
Authors | M. Oğuzhan Külekci, Yasin Öztürk, Elif Altunok, Can Altıniğne |
Abstract | Non-uniquely decodable codes can be defined as the codes that cannot be uniquely decoded without additional disambiguation information. These are mainly the class of non-prefix-free codes, where a codeword can be a prefix of other(s), and thus, the codeword boundary information is essential for correct decoding. Although the codeword bit stream consumes significantly less space when compared to prefix–free codes, the additional disambiguation information makes it difficult to catch the performance of prefix-free codes in total. Previous studies considered compression with non-prefix-free codes by integrating rank/select dictionaries or wavelet trees to mark the code-word boundaries. In this study we focus on another dimension with a block–wise enumeration scheme that improves the compression ratios of the previous studies significantly. Experiments conducted on a known corpus showed that the proposed scheme successfully represents a source within its entropy, even performing better than the Huffman and arithmetic coding in some cases. The non-uniquely decodable codes also provides an intrinsic security feature due to lack of unique-decodability. We investigate this dimension as an opportunity to provide compressed data security without (or with less) encryption, and discuss various possible practical advantages supported by such codes. |
Tasks | |
Published | 2019-11-13 |
URL | https://arxiv.org/abs/1911.05676v1 |
https://arxiv.org/pdf/1911.05676v1.pdf | |
PWC | https://paperswithcode.com/paper/enumerative-data-compression-with-non |
Repo | |
Framework | |
Predicting the Role of Political Trolls in Social Media
Title | Predicting the Role of Political Trolls in Social Media |
Authors | Atanas Atanasov, Gianmarco De Francisci Morales, Preslav Nakov |
Abstract | We investigate the political roles of “Internet trolls” in social media. Political trolls, such as the ones linked to the Russian Internet Research Agency (IRA), have recently gained enormous attention for their ability to sway public opinion and even influence elections. Analysis of the online traces of trolls has shown different behavioral patterns, which target different slices of the population. However, this analysis is manual and labor-intensive, thus making it impractical as a first-response tool for newly-discovered troll farms. In this paper, we show how to automate this analysis by using machine learning in a realistic setting. In particular, we show how to classify trolls according to their political role —left, news feed, right— by using features extracted from social media, i.e., Twitter, in two scenarios: (i) in a traditional supervised learning scenario, where labels for trolls are available, and (ii) in a distant supervision scenario, where labels for trolls are not available, and we rely on more-commonly-available labels for news outlets mentioned by the trolls. Technically, we leverage the community structure and the text of the messages in the online social network of trolls represented as a graph, from which we extract several types of learned representations, i.e.,~embeddings, for the trolls. Experiments on the “IRA Russian Troll” dataset show that our methodology improves over the state-of-the-art in the first scenario, while providing a compelling case for the second scenario, which has not been explored in the literature thus far. |
Tasks | |
Published | 2019-10-04 |
URL | https://arxiv.org/abs/1910.02001v1 |
https://arxiv.org/pdf/1910.02001v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-the-role-of-political-trolls-in |
Repo | |
Framework | |
Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards
Title | Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards |
Authors | Yuqing Song, Shizhe Chen, Yida Zhao, Qin Jin |
Abstract | Generating image descriptions in different languages is essential to satisfy users worldwide. However, it is prohibitively expensive to collect large-scale paired image-caption dataset for every target language which is critical for training descent image captioning models. Previous works tackle the unpaired cross-lingual image captioning problem through a pivot language, which is with the help of paired image-caption data in the pivot language and pivot-to-target machine translation models. However, such language-pivoted approach suffers from inaccuracy brought by the pivot-to-target translation, including disfluency and visual irrelevancy errors. In this paper, we propose to generate cross-lingual image captions with self-supervised rewards in the reinforcement learning framework to alleviate these two types of errors. We employ self-supervision from mono-lingual corpus in the target language to provide fluency reward, and propose a multi-level visual semantic matching model to provide both sentence-level and concept-level visual relevancy rewards. We conduct extensive experiments for unpaired cross-lingual image captioning in both English and Chinese respectively on two widely used image caption corpora. The proposed approach achieves significant performance improvement over state-of-the-art methods. |
Tasks | Image Captioning, Machine Translation |
Published | 2019-08-15 |
URL | https://arxiv.org/abs/1908.05407v1 |
https://arxiv.org/pdf/1908.05407v1.pdf | |
PWC | https://paperswithcode.com/paper/unpaired-cross-lingual-image-caption |
Repo | |
Framework | |
Face Image Reflection Removal
Title | Face Image Reflection Removal |
Authors | Renjie Wan, Boxin Shi, Haoliang Li, Ling-Yu Duan, Alex C. Kot |
Abstract | Face images captured through the glass are usually contaminated by reflections. The non-transmitted reflections make the reflection removal more challenging than for general scenes, because important facial features are completely occluded. In this paper, we propose and solve the face image reflection removal problem. We remove non-transmitted reflections by incorporating inpainting ideas into a guided reflection removal framework and recover facial features by considering various face-specific priors. We use a newly collected face reflection image dataset to train our model and compare with state-of-the-art methods. The proposed method shows advantages in estimating reflection-free face images for improving face recognition. |
Tasks | Face Recognition |
Published | 2019-03-03 |
URL | http://arxiv.org/abs/1903.00865v1 |
http://arxiv.org/pdf/1903.00865v1.pdf | |
PWC | https://paperswithcode.com/paper/face-image-reflection-removal |
Repo | |
Framework | |