Paper Group ANR 1043
Information-Theoretic Perspective of Federated Learning. Operation-aware Neural Networks for User Response Prediction. Introduction to the 35th International Conference on Logic Programming Special Issue. OCR4all – An Open-Source Tool Providing a (Semi-)Automatic OCR Workflow for Historical Printings. Differentiable Visual Computing. Message-passi …
Information-Theoretic Perspective of Federated Learning
Title | Information-Theoretic Perspective of Federated Learning |
Authors | Linara Adilova, Julia Rosenzweig, Michael Kamp |
Abstract | An approach to distributed machine learning is to train models on local datasets and aggregate these models into a single, stronger model. A popular instance of this form of parallelization is federated learning, where the nodes periodically send their local models to a coordinator that aggregates them and redistributes the aggregation back to continue training with it. The most frequently used form of aggregation is averaging the model parameters, e.g., the weights of a neural network. However, due to the non-convexity of the loss surface of neural networks, averaging can lead to detrimental effects and it remains an open question under which conditions averaging is beneficial. In this paper, we study this problem from the perspective of information theory: We measure the mutual information between representation and inputs as well as representation and labels in local models and compare it to the respective information contained in the representation of the averaged model. Our empirical results confirm previous observations about the practical usefulness of averaging for neural networks, even if local dataset distributions vary strongly. Furthermore, we obtain more insights about the impact of the aggregation frequency on the information flow and thus on the success of distributed learning. These insights will be helpful both in improving the current synchronization process and in further understanding the effects of model aggregation. |
Tasks | |
Published | 2019-11-15 |
URL | https://arxiv.org/abs/1911.07652v1 |
https://arxiv.org/pdf/1911.07652v1.pdf | |
PWC | https://paperswithcode.com/paper/information-theoretic-perspective-of |
Repo | |
Framework | |
Operation-aware Neural Networks for User Response Prediction
Title | Operation-aware Neural Networks for User Response Prediction |
Authors | Yi Yang, Baile Xu, Furao Shen, Jian Zhao |
Abstract | User response prediction makes a crucial contribution to the rapid development of online advertising system and recommendation system. The importance of learning feature interactions has been emphasized by many works. Many deep models are proposed to automatically learn high-order feature interactions. Since most features in advertising system and recommendation system are high-dimensional sparse features, deep models usually learn a low-dimensional distributed representation for each feature in the bottom layer. Besides traditional fully-connected architectures, some new operations, such as convolutional operations and product operations, are proposed to learn feature interactions better. In these models, the representation is shared among different operations. However, the best representation for different operations may be different. In this paper, we propose a new neural model named Operation-aware Neural Networks (ONN) which learns different representations for different operations. Our experimental results on two large-scale real-world ad click/conversion datasets demonstrate that ONN consistently outperforms the state-of-the-art models in both offline-training environment and online-training environment. |
Tasks | |
Published | 2019-04-02 |
URL | http://arxiv.org/abs/1904.12579v1 |
http://arxiv.org/pdf/1904.12579v1.pdf | |
PWC | https://paperswithcode.com/paper/190412579 |
Repo | |
Framework | |
Introduction to the 35th International Conference on Logic Programming Special Issue
Title | Introduction to the 35th International Conference on Logic Programming Special Issue |
Authors | Esra Erdem, Andrea Formisano, German Vidal, Fangkai Yang |
Abstract | We are proud to introduce this special issue of Theory and Practice of Logic Programming (TPLP), dedicated to the regular papers accepted for the 35th International Conference on Logic Programming (ICLP). The ICLP meetings started in Marseille in 1982 and since then constitute the main venue for presenting and discussing work in the area of logic programming. Under consideration for acceptance in TPLP. |
Tasks | |
Published | 2019-08-10 |
URL | https://arxiv.org/abs/1908.03719v1 |
https://arxiv.org/pdf/1908.03719v1.pdf | |
PWC | https://paperswithcode.com/paper/introduction-to-the-35th-international |
Repo | |
Framework | |
OCR4all – An Open-Source Tool Providing a (Semi-)Automatic OCR Workflow for Historical Printings
Title | OCR4all – An Open-Source Tool Providing a (Semi-)Automatic OCR Workflow for Historical Printings |
Authors | Christian Reul, Dennis Christ, Alexander Hartelt, Nico Balbach, Maximilian Wehner, Uwe Springmann, Christoph Wick, Christine Grundig, Andreas Büttner, Frank Puppe |
Abstract | Optical Character Recognition (OCR) on historical printings is a challenging task mainly due to the complexity of the layout and the highly variant typography. Nevertheless, in the last few years great progress has been made in the area of historical OCR, resulting in several powerful open-source tools for preprocessing, layout recognition and segmentation, character recognition and post-processing. The drawback of these tools often is their limited applicability by non-technical users like humanist scholars and in particular the combined use of several tools in a workflow. In this paper we present an open-source OCR software called OCR4all, which combines state-of-the-art OCR components and continuous model training into a comprehensive workflow. A comfortable GUI allows error corrections not only in the final output, but already in early stages to minimize error propagations. Further on, extensive configuration capabilities are provided to set the degree of automation of the workflow and to make adaptations to the carefully selected default parameters for specific printings, if necessary. Experiments showed that users with minimal or no experience were able to capture the text of even the earliest printed books with manageable effort and great quality, achieving excellent character error rates (CERs) below 0.5%. The fully automated application on 19th century novels showed that OCR4all can considerably outperform the commercial state-of-the-art tool ABBYY Finereader on moderate layouts if suitably pretrained mixed OCR models are available. The architecture of OCR4all allows the easy integration (or substitution) of newly developed tools for its main components by standardized interfaces like PageXML, thus aiming at continual higher automation for historical printings. |
Tasks | Optical Character Recognition |
Published | 2019-09-09 |
URL | https://arxiv.org/abs/1909.04032v1 |
https://arxiv.org/pdf/1909.04032v1.pdf | |
PWC | https://paperswithcode.com/paper/ocr4all-an-open-source-tool-providing-a-semi |
Repo | |
Framework | |
Differentiable Visual Computing
Title | Differentiable Visual Computing |
Authors | Tzu-Mao Li |
Abstract | Derivatives of computer graphics, image processing, and deep learning algorithms have tremendous use in guiding parameter space searches, or solving inverse problems. As the algorithms become more sophisticated, we no longer only need to differentiate simple mathematical functions, but have to deal with general programs which encode complex transformations of data. This dissertation introduces three tools for addressing the challenges that arise when obtaining and applying the derivatives for complex graphics algorithms. Traditionally, practitioners have been constrained to composing programs with a limited set of operators, or hand-deriving derivatives. We extend the image processing language Halide with reverse-mode automatic differentiation, and the ability to automatically optimize the gradient computations. This enables automatic generation of the gradients of arbitrary Halide programs, at high performance, with little programmer effort. In 3D rendering, the gradient is required with respect to variables such as camera parameters, geometry, and appearance. However, computing the gradient is challenging because the rendering integral includes visibility terms that are not differentiable. We introduce, to our knowledge, the first general-purpose differentiable ray tracer that solves the full rendering equation, while correctly taking the geometric discontinuities into account. Finally, we demonstrate that the derivatives of light path throughput can also be useful for guiding sampling in forward rendering. Simulating light transport in the presence of multi-bounce glossy effects and motion in 3D rendering is challenging due to the hard-to-sample high-contribution areas. We present a Markov Chain Monte Carlo rendering algorithm that extends Metropolis Light Transport by automatically and explicitly adapting to the local integrand, thereby increasing sampling efficiency. |
Tasks | |
Published | 2019-04-27 |
URL | https://arxiv.org/abs/1904.12228v2 |
https://arxiv.org/pdf/1904.12228v2.pdf | |
PWC | https://paperswithcode.com/paper/differentiable-visual-computing |
Repo | |
Framework | |
Message-passing algorithm of quantum annealing with nonstoquastic Hamiltonian
Title | Message-passing algorithm of quantum annealing with nonstoquastic Hamiltonian |
Authors | Masayuki Ohzeki |
Abstract | Quantum annealing (QA) is a generic method for solving optimization problems using fictitious quantum fluctuation. The current device performing QA involves controlling the transverse field; it is classically simulatable by using the standard technique for mapping the quantum spin systems to the classical ones. In this sense, the current system for QA is not powerful despite utilizing quantum fluctuation. Hence, we developed a system with a time-dependent Hamiltonian consisting of a combination of the formulated Ising model and the “driver” Hamiltonian with only quantum fluctuation. In the previous study, for a fully connected spin model, quantum fluctuation can be addressed in a relatively simple way. We proved that the fully connected antiferromagnetic interaction can be transformed into a fluctuating transverse field and is thus classically simulatable at sufficiently low temperatures. Using the fluctuating transverse field, we established several ways to simulate part of the nonstoquastic Hamiltonian on classical computers. We formulated a message-passing algorithm in the present study. This algorithm is capable of assessing the performance of QA with part of the nonstoquastic Hamiltonian having a large number of spins. In other words, we developed a different approach for simulating the nonstoquastic Hamiltonian without using the quantum Monte Carlo technique. Our results were validated by comparison to the results obtained by the replica method. |
Tasks | |
Published | 2019-01-21 |
URL | http://arxiv.org/abs/1901.06901v2 |
http://arxiv.org/pdf/1901.06901v2.pdf | |
PWC | https://paperswithcode.com/paper/message-passing-algorithm-of-quantum |
Repo | |
Framework | |
EarthquakeGen: Earthquake Simulation Using Generative Adversarial Networks
Title | EarthquakeGen: Earthquake Simulation Using Generative Adversarial Networks |
Authors | Tiantong Wang, Youzuo Lin |
Abstract | Detecting earthquake events from seismic time series has proved itself a challenging task. Manual detection can be expensive and tedious due to the intensive labor and large scale data set. In recent years, automatic detection methods based on machine learning have been developed to improve accuracy and efficiency. However, the accuracy of those methods relies on a sufficient amount of high-quality training data, which itself can be expensive to obtain due to the requirement of domain knowledge and subject matter expertise. This paper is to resolve this dilemma by answering two questions: (1) provided with a limited number of reliable labels, can we use them to generate more synthetic labels; (2) Can we use those synthetic labels to improve the detectability? Among all the existing generative models, the generative adversarial network (GAN) shows its supreme capability in generating high-quality synthetic samples in multiple domains. We designed our model based on GAN. In particular, we studied several different network structures. By comparing the generated results, our GAN-based generative model yields the highest quality. We further combine the dataset with synthetic samples generated by our generative model and show that the detectability of our earthquake classification model is significantly improved than the one trained without augmenting the training set. |
Tasks | Time Series |
Published | 2019-11-10 |
URL | https://arxiv.org/abs/1911.03966v1 |
https://arxiv.org/pdf/1911.03966v1.pdf | |
PWC | https://paperswithcode.com/paper/earthquakegen-earthquake-simulation-using |
Repo | |
Framework | |
Learning icons appearance similarity
Title | Learning icons appearance similarity |
Authors | Manuel Lagunas, Elena Garces, Diego Gutierrez |
Abstract | Selecting an optimal set of icons is a crucial step in the pipeline of visual design to structure and navigate through content. However, designing the icons sets is usually a difficult task for which expert knowledge is required. In this work, to ease the process of icon set selection to the users, we propose a similarity metric which captures the properties of style and visual identity. We train a Siamese Neural Network with an online dataset of icons organized in visually coherent collections that are used to adaptively sample training data and optimize the training process. As the dataset contains noise, we further collect human-rated information on the perception of icon’s similarity which will be used for evaluating and testing the proposed model. We present several results and applications based on searches, kernel visualizations and optimized set proposals that can be helpful for designers and non-expert users while exploring large collections of icons. |
Tasks | |
Published | 2019-02-01 |
URL | http://arxiv.org/abs/1902.05378v1 |
http://arxiv.org/pdf/1902.05378v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-icons-appearance-similarity |
Repo | |
Framework | |
Clustering Discrete-Valued Time Series
Title | Clustering Discrete-Valued Time Series |
Authors | Tyler Roick, Dimitris Karlis, Paul D. McNicholas |
Abstract | There is a need for the development of models that are able to account for discreteness in data, along with its time series properties and correlation. Our focus falls on INteger-valued AutoRegressive (INAR) type models. The INAR type models can be used in conjunction with existing model-based clustering techniques to cluster discrete-valued time series data. With the use of a finite mixture model, several existing techniques such as the selection of the number of clusters, estimation using expectation-maximization and model selection are applicable. The proposed model is then demonstrated on real data to illustrate its clustering applications. |
Tasks | Model Selection, Time Series |
Published | 2019-01-26 |
URL | https://arxiv.org/abs/1901.09249v2 |
https://arxiv.org/pdf/1901.09249v2.pdf | |
PWC | https://paperswithcode.com/paper/clustering-discrete-valued-time-series |
Repo | |
Framework | |
Computational analysis of laminar structure of the human cortex based on local neuron features
Title | Computational analysis of laminar structure of the human cortex based on local neuron features |
Authors | Andrija Štajduhar, Tomislav Lipić, Goran Sedmak, Sven Lončarić, Miloš Judaš |
Abstract | In this paper, we present a novel method for analysis and segmentation of laminar structure of the cortex based on tissue characteristics whose change across the gray matter underlies distinctive between cortical layers. We develop and analyze features of individual neurons to investigate changes in cytoarchitectonic differentiation and present a novel high-performance, automated framework for neuron-level histological image analysis. Local tissue and cell descriptors such as density, neuron size and other measures are used for development of more complex neuron features used in machine learning model trained on data manually labeled by three human experts. Final neuron layer classifications were obtained by training a separate model for each expert and combining their probability outputs. Importances of developed neuron features on both global model level and individual prediction level are presented and discussed. |
Tasks | |
Published | 2019-05-03 |
URL | https://arxiv.org/abs/1905.01173v2 |
https://arxiv.org/pdf/1905.01173v2.pdf | |
PWC | https://paperswithcode.com/paper/computational-analysis-of-laminar-structure |
Repo | |
Framework | |
Revisiting Simple Domain Adaptation Methods in Unsupervised Neural Machine Translation
Title | Revisiting Simple Domain Adaptation Methods in Unsupervised Neural Machine Translation |
Authors | Haipeng Sun, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao |
Abstract | Domain adaptation has been well-studied in supervised neural machine translation (SNMT). However, it has not been well-studied for unsupervised neural machine translation (UNMT), although UNMT has recently achieved remarkable results in several domain-specific language pairs. Besides the inconsistent domains between training data and test data for SNMT, there sometimes exists an inconsistent domain between two monolingual training data for UNMT. In this work, we empirically show different scenarios for unsupervised neural machine translation. Based on these scenarios, we revisit the effect of the existing domain adaptation methods including batch weighting and fine tuning methods in UNMT. Finally, we propose modified methods to improve the performances of domain-specific UNMT systems. |
Tasks | Domain Adaptation, Machine Translation |
Published | 2019-08-26 |
URL | https://arxiv.org/abs/1908.09605v2 |
https://arxiv.org/pdf/1908.09605v2.pdf | |
PWC | https://paperswithcode.com/paper/an-empirical-study-of-domain-adaptation-for |
Repo | |
Framework | |
Real to H-space Encoder for Speech Recognition
Title | Real to H-space Encoder for Speech Recognition |
Authors | Titouan Parcollet, Mohamed Morchid, Georges Linarès, Renato De Mori |
Abstract | Deep neural networks (DNNs) and more precisely recurrent neural networks (RNNs) are at the core of modern automatic speech recognition systems, due to their efficiency to process input sequences. Recently, it has been shown that different input representations, based on multidimensional algebras, such as complex and quaternion numbers, are able to bring to neural networks a more natural, compressive and powerful representation of the input signal by outperforming common real-valued NNs. Indeed, quaternion-valued neural networks (QNNs) better learn both internal dependencies, such as the relation between the Mel-filter-bank value of a specific time frame and its time derivatives, and global dependencies, describing the relations that exist between time frames. Nonetheless, QNNs are limited to quaternion-valued input signals, and it is difficult to benefit from this powerful representation with real-valued input data. This paper proposes to tackle this weakness by introducing a real-to-quaternion encoder that allows QNNs to process any one dimensional input features, such as traditional Mel-filter-banks for automatic speech recognition. |
Tasks | Speech Recognition |
Published | 2019-06-17 |
URL | https://arxiv.org/abs/1906.08043v1 |
https://arxiv.org/pdf/1906.08043v1.pdf | |
PWC | https://paperswithcode.com/paper/real-to-h-space-encoder-for-speech |
Repo | |
Framework | |
On Open-Universe Causal Reasoning
Title | On Open-Universe Causal Reasoning |
Authors | Duligur Ibeling, Thomas Icard |
Abstract | We extend two kinds of causal models, structural equation models and simulation models, to infinite variable spaces. This enables a semantics for conditionals founded on a calculus of intervention, and axiomatization of causal reasoning for rich, expressive generative models—including those in which a causal representation exists only implicitly—in an open-universe setting. Further, we show that under suitable restrictions the two kinds of models are equivalent, perhaps surprisingly as their axiomatizations differ substantially in the general case. We give a series of complete axiomatizations in which the open-universe nature of the setting is seen to be essential. |
Tasks | |
Published | 2019-07-04 |
URL | https://arxiv.org/abs/1907.02170v1 |
https://arxiv.org/pdf/1907.02170v1.pdf | |
PWC | https://paperswithcode.com/paper/on-open-universe-causal-reasoning |
Repo | |
Framework | |
HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models
Title | HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models |
Authors | Sharon Zhou, Mitchell L. Gordon, Ranjay Krishna, Austin Narcomey, Li Fei-Fei, Michael S. Bernstein |
Abstract | Generative models often use human evaluations to measure the perceived quality of their outputs. Automated metrics are noisy indirect proxies, because they rely on heuristics or pretrained embeddings. However, up until now, direct human evaluation strategies have been ad-hoc, neither standardized nor validated. Our work establishes a gold standard human benchmark for generative realism. We construct Human eYe Perceptual Evaluation (HYPE) a human benchmark that is (1) grounded in psychophysics research in perception, (2) reliable across different sets of randomly sampled outputs from a model, (3) able to produce separable model performances, and (4) efficient in cost and time. We introduce two variants: one that measures visual perception under adaptive time constraints to determine the threshold at which a model’s outputs appear real (e.g. 250ms), and the other a less expensive variant that measures human error rate on fake and real images sans time constraints. We test HYPE across six state-of-the-art generative adversarial networks and two sampling techniques on conditional and unconditional image generation using four datasets: CelebA, FFHQ, CIFAR-10, and ImageNet. We find that HYPE can track model improvements across training epochs, and we confirm via bootstrap sampling that HYPE rankings are consistent and replicable. |
Tasks | Image Generation |
Published | 2019-04-01 |
URL | https://arxiv.org/abs/1904.01121v4 |
https://arxiv.org/pdf/1904.01121v4.pdf | |
PWC | https://paperswithcode.com/paper/hype-human-eye-perceptual-evaluation-of |
Repo | |
Framework | |
Exploiting Reuse in Pipeline-Aware Hyperparameter Tuning
Title | Exploiting Reuse in Pipeline-Aware Hyperparameter Tuning |
Authors | Liam Li, Evan Sparks, Kevin Jamieson, Ameet Talwalkar |
Abstract | Hyperparameter tuning of multi-stage pipelines introduces a significant computational burden. Motivated by the observation that work can be reused across pipelines if the intermediate computations are the same, we propose a pipeline-aware approach to hyperparameter tuning. Our approach optimizes both the design and execution of pipelines to maximize reuse. We design pipelines amenable for reuse by (i) introducing a novel hybrid hyperparameter tuning method called gridded random search, and (ii) reducing the average training time in pipelines by adapting early-stopping hyperparameter tuning approaches. We then realize the potential for reuse during execution by introducing a novel caching problem for ML workloads which we pose as a mixed integer linear program (ILP), and subsequently evaluating various caching heuristics relative to the optimal solution of the ILP. We conduct experiments on simulated and real-world machine learning pipelines to show that a pipeline-aware approach to hyperparameter tuning can offer over an order-of-magnitude speedup over independently evaluating pipeline configurations. |
Tasks | |
Published | 2019-03-12 |
URL | http://arxiv.org/abs/1903.05176v1 |
http://arxiv.org/pdf/1903.05176v1.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-reuse-in-pipeline-aware |
Repo | |
Framework | |