Paper Group ANR 1014
The Generalization-Stability Tradeoff in Neural Network Pruning. Empirical Evaluations of Active Learning Strategies in Legal Document Review. Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions. Smart, Deep Copy-Paste. Multi-hop Federated Private Data Augmentation with Sample Compression. Fast and robust dete …
The Generalization-Stability Tradeoff in Neural Network Pruning
Title | The Generalization-Stability Tradeoff in Neural Network Pruning |
Authors | Brian R. Bartoldson, Ari S. Morcos, Adrian Barbu, Gordon Erlebacher |
Abstract | Pruning neural network parameters is often viewed as a means to compress models, but pruning has also been motivated by the desire to prevent overfitting. This motivation is particularly relevant given the perhaps surprising observation that a wide variety of pruning approaches increase test accuracy despite sometimes massive reductions in parameter counts. To better understand this phenomenon, we analyze the behavior of pruning over the course of training, finding that pruning’s effect on generalization relies more on the instability it generates (defined as the drops in test accuracy immediately following pruning) than on the final size of the pruned model. We demonstrate that even the pruning of unimportant parameters can lead to such instability, and show similarities between pruning and regularizing by injecting noise, suggesting a mechanism for pruning-based generalization improvements that is compatible with the strong generalization recently observed in over-parameterized networks. |
Tasks | Network Pruning |
Published | 2019-06-09 |
URL | https://arxiv.org/abs/1906.03728v3 |
https://arxiv.org/pdf/1906.03728v3.pdf | |
PWC | https://paperswithcode.com/paper/the-generalization-stability-tradeoff-in |
Repo | |
Framework | |
Empirical Evaluations of Active Learning Strategies in Legal Document Review
Title | Empirical Evaluations of Active Learning Strategies in Legal Document Review |
Authors | Rishi Chhatwal, Nathaniel Huber-Fliflet, Robert Keeling, Jianping Zhang, Haozhen Zhao |
Abstract | One type of machine learning, text classification, is now regularly applied in the legal matters involving voluminous document populations because it can reduce the time and expense associated with the review of those documents. One form of machine learning - Active Learning - has drawn attention from the legal community because it offers the potential to make the machine learning process even more effective. Active Learning, applied to legal documents, is considered a new technology in the legal domain and is continuously applied to all documents in a legal matter until an insignificant number of relevant documents are left for review. This implementation is slightly different than traditional implementations of Active Learning where the process stops once achieving acceptable model performance. The purpose of this paper is twofold: (i) to question whether Active Learning actually is a superior learning methodology and (ii) to highlight the ways that Active Learning can be most effectively applied to real legal industry data. Unlike other studies, our experiments were performed against large data sets taken from recent, real-world legal matters covering a variety of areas. We conclude that, although these experiments show the Active Learning strategy popularly used in legal document review can quickly identify informative training documents, it becomes less effective over time. In particular, our findings suggest this most popular form of Active Learning in the legal arena, where the highest-scoring documents are selected as training examples, is in fact not the most efficient approach in most instances. Ultimately, a different Active Learning strategy may be best suited to initiate the predictive modeling process but not to continue through the entire document review. |
Tasks | Active Learning, Text Classification |
Published | 2019-04-03 |
URL | http://arxiv.org/abs/1904.01719v1 |
http://arxiv.org/pdf/1904.01719v1.pdf | |
PWC | https://paperswithcode.com/paper/empirical-evaluations-of-active-learning |
Repo | |
Framework | |
Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions
Title | Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions |
Authors | Yao Qin, Nicholas Frosst, Sara Sabour, Colin Raffel, Garrison Cottrell, Geoffrey Hinton |
Abstract | Adversarial examples raise questions about whether neural network models are sensitive to the same visual features as humans. In this paper, we first detect adversarial examples or otherwise corrupted images based on a class-conditional reconstruction of the input. To specifically attack our detection mechanism, we propose the Reconstructive Attack which seeks both to cause a misclassification and a low reconstruction error. This reconstructive attack produces undetected adversarial examples but with much smaller success rate. Among all these attacks, we find that CapsNets always perform better than convolutional networks. Then, we diagnose the adversarial examples for CapsNets and find that the success of the reconstructive attack is highly related to the visual similarity between the source and target class. Additionally, the resulting perturbations can cause the input image to appear visually more like the target class and hence become non-adversarial. This suggests that CapsNets use features that are more aligned with human perception and have the potential to address the central issue raised by adversarial examples. |
Tasks | |
Published | 2019-07-05 |
URL | https://arxiv.org/abs/1907.02957v2 |
https://arxiv.org/pdf/1907.02957v2.pdf | |
PWC | https://paperswithcode.com/paper/detecting-and-diagnosing-adversarial-images |
Repo | |
Framework | |
Smart, Deep Copy-Paste
Title | Smart, Deep Copy-Paste |
Authors | Tiziano Portenier, Qiyang Hu, Paolo Favaro, Matthias Zwicker |
Abstract | In this work, we propose a novel system for smart copy-paste, enabling the synthesis of high-quality results given a masked source image content and a target image context as input. Our system naturally resolves both shading and geometric inconsistencies between source and target image, resulting in a merged result image that features the content from the pasted source image, seamlessly pasted into the target context. Our framework is based on a novel training image transformation procedure that allows to train a deep convolutional neural network end-to-end to automatically learn a representation that is suitable for copy-pasting. Our training procedure works with any image dataset without additional information such as labels, and we demonstrate the effectiveness of our system on two popular datasets, high-resolution face images and the more complex Cityscapes dataset. Our technique outperforms the current state of the art on face images, and we show promising results on the Cityscapes dataset, demonstrating that our system generalizes to much higher resolution than the training data. |
Tasks | |
Published | 2019-03-15 |
URL | http://arxiv.org/abs/1903.06763v1 |
http://arxiv.org/pdf/1903.06763v1.pdf | |
PWC | https://paperswithcode.com/paper/smart-deep-copy-paste |
Repo | |
Framework | |
Multi-hop Federated Private Data Augmentation with Sample Compression
Title | Multi-hop Federated Private Data Augmentation with Sample Compression |
Authors | Eunjeong Jeong, Seungeun Oh, Jihong Park, Hyesung Kim, Mehdi Bennis, Seong-Lyun Kim |
Abstract | On-device machine learning (ML) has brought about the accessibility to a tremendous amount of data from the users while keeping their local data private instead of storing it in a central entity. However, for privacy guarantee, it is inevitable at each device to compensate for the quality of data or learning performance, especially when it has a non-IID training dataset. In this paper, we propose a data augmentation framework using a generative model: multi-hop federated augmentation with sample compression (MultFAug). A multi-hop protocol speeds up the end-to-end over-the-air transmission of seed samples by enhancing the transport capacity. The relaying devices guarantee stronger privacy preservation as well since the origin of each seed sample is hidden in those participants. For further privatization on the individual sample level, the devices compress their data samples. The devices sparsify their data samples prior to transmissions to reduce the sample size, which impacts the communication payload. This preprocessing also strengthens the privacy of each sample, which corresponds to the input perturbation for preserving sample privacy. The numerical evaluations show that the proposed framework significantly improves privacy guarantee, transmission delay, and local training performance with adjustment to the number of hops and compression rate. |
Tasks | Data Augmentation |
Published | 2019-07-15 |
URL | https://arxiv.org/abs/1907.06426v1 |
https://arxiv.org/pdf/1907.06426v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-hop-federated-private-data-augmentation |
Repo | |
Framework | |
Fast and robust detection of solar modules in electroluminescence images
Title | Fast and robust detection of solar modules in electroluminescence images |
Authors | Mathis Hoffmann, Bernd Doll, Florian Talkenberg, Christoph J. Brabec, Andreas K. Maier, Vincent Christlein |
Abstract | Fast, non-destructive and on-site quality control tools, mainly high sensitive imaging techniques, are important to assess the reliability of photovoltaic plants. To minimize the risk of further damages and electrical yield losses, electroluminescence (EL) imaging is used to detect local defects in an early stage, which might cause future electric losses. For an automated defect recognition on EL measurements, a robust detection and rectification of modules, as well as an optional segmentation into cells is required. This paper introduces a method to detect solar modules and crossing points between solar cells in EL images. We only require 1-D image statistics for the detection, resulting in an approach that is computationally efficient. In addition, the method is able to detect the modules under perspective distortion and in scenarios, where multiple modules are visible in the image. We compare our method to the state of the art and show that it is superior in presence of perspective distortion while the performance on images, where the module is roughly coplanar to the detector, is similar to the reference method. Finally, we show that we greatly improve in terms of computational time in comparison to the reference method. |
Tasks | |
Published | 2019-07-19 |
URL | https://arxiv.org/abs/1907.08451v1 |
https://arxiv.org/pdf/1907.08451v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-and-robust-detection-of-solar-modules-in |
Repo | |
Framework | |
Regularity as Regularization: Smooth and Strongly Convex Brenier Potentials in Optimal Transport
Title | Regularity as Regularization: Smooth and Strongly Convex Brenier Potentials in Optimal Transport |
Authors | François-Pierre Paty, Alexandre d’Aspremont, Marco Cuturi |
Abstract | The problem of estimating Wasserstein distances between two densities living in high-dimension suffers from the curse of dimensionality: One needs an exponential (wrt dimension) number of samples to ensure that the distance between the two empirical measures is comparable to the distance between the original densities. Therefore, optimal transport (OT) geometry can only be used in machine learning if the OT problem is substantially regularized. On the other hand, one of the greatest achievements of the OT literature in recent years lies in regularity theory: Caffarelli showed that the OT map between two well behaved measures is Lipschitz, or, equivalently when considering 2-Wasserstein distances, that the Brenier convex potential (whose gradient yields an optimal map) is smooth. We propose in this work to draw inspiration from this theory and use regularity as a regularization tool. We give algorithms operating on two discrete measures that can recover nearly optimal transport maps with small distortion, or equivalently, nearly optimal Brenier potentials that are strongly convex and smooth. For univariate measures, we show that computing these potentials is equivalent to solving an isotonic regression problem with Lipschitz and strong monotonicity constraints. For multivariate measures the problem boils down to solving alternatively a convex QCQP and a discrete OT problem. We recover this way the values and gradients of the Brenier potential on sampled points, but also show more generally, that values and gradients of the potential can be computed out of sample, at the cost of solving a simpler QCQP for each evaluation. Building on these two formulations we propose algorithms to estimate and evaluate transport maps with desired regularity properties, benchmark their statistical performance, apply them to domain adaptation and visualize their action on a color transfer task. |
Tasks | Domain Adaptation |
Published | 2019-05-26 |
URL | https://arxiv.org/abs/1905.10812v4 |
https://arxiv.org/pdf/1905.10812v4.pdf | |
PWC | https://paperswithcode.com/paper/regularity-as-regularization-smooth-and |
Repo | |
Framework | |
TinyCNN: A Tiny Modular CNN Accelerator for Embedded FPGA
Title | TinyCNN: A Tiny Modular CNN Accelerator for Embedded FPGA |
Authors | Ali Jahanshahi |
Abstract | In recent years, Convolutional Neural Network (CNN) based methods have achieved great success in a large number of applications and have been among the most powerful and widely used techniques in computer vision. However, CNN-based methods are computational-intensive and resource-consuming, and thus are hard to be integrated into embedded systems such as smart phones, smart glasses, and robots. FPGA is one of the most promising platforms for accelerating CNN, but the limited on-chip memory size limit the performance of FPGA accelerator for CNN. In this paper, we propose a framework for designing CNN accelerator on embedded FPGA for image classification. The proposed framework provides a tool for FPGA resource-aware design space exploration of CNNs and automatically generates the hardware description of the CNN to be programmed on a target FPGA. The framework consists of three main backends; software, hardware generation, and simulation/precision adjustment. The software backend serves as an API to the designer to design the CNN and train it according to the hardware resources that are available. Using the CNN model, hardware backend generates the necessary hardware components and integrates them to generate the hardware description of the CNN. Finaly, Simulation/precision adjustment backend adjusts the inter-layer precision units to minimize the classification error. We used 16-bit fixed-point data in a CNN accelerator (FPGA) and compared it to the exactly similar software version running on an ARM processor (32-bit floating point data). We encounter about 3% accuracy loss in classification of the accelerated (FPGA) version. In return, we got up to 15.75x speedup by classifying with the accelerated version on the FPGA. |
Tasks | Image Classification |
Published | 2019-11-15 |
URL | https://arxiv.org/abs/1911.06777v1 |
https://arxiv.org/pdf/1911.06777v1.pdf | |
PWC | https://paperswithcode.com/paper/tinycnn-a-tiny-modular-cnn-accelerator-for |
Repo | |
Framework | |
An Empirical Study of the Application of Machine Learning and Keyword Terms Methodologies to Privilege-Document Review Projects in Legal Matters
Title | An Empirical Study of the Application of Machine Learning and Keyword Terms Methodologies to Privilege-Document Review Projects in Legal Matters |
Authors | Peter Gronvall, Nathaniel Huber-Fliflet, Jianping Zhang, Robert Keeling, Robert Neary, Haozhen Zhao |
Abstract | Protecting privileged communications and data from disclosure is paramount for legal teams. Unrestricted legal advice, such as attorney-client communications or litigation strategy. are vital to the legal process and are exempt from disclosure in litigations or regulatory events. To protect this information from being disclosed, companies and outside counsel must review vast amounts of documents to determine those that contain privileged material. This process is extremely costly and time consuming. As data volumes increase, legal counsel employ methods to reduce the number of documents requiring review while balancing the need to ensure the protection of privileged information. Keyword searching is relied upon as a method to target privileged information and reduce document review populations. Keyword searches are effective at casting a wide net but return over inclusive results – most of which do not contain privileged information – and without detailed knowledge of the data, keyword lists cannot be crafted to find all privilege material. Overly-inclusive keyword searching can also be problematic, because even while it drives up costs, it also can cast `too far of a net’ and thus produce unreliable results.To overcome these weaknesses of keyword searching, legal teams are using a new method to target privileged information called predictive modeling. Predictive modeling can successfully identify privileged material but little research has been published to confirm its effectiveness when compared to keyword searching. This paper summarizes a study of the effectiveness of keyword searching and predictive modeling when applied to real-world data. With this study, this group of collaborators wanted to examine and understand the benefits and weaknesses of both approaches to legal teams with identifying privilege material in document populations. | |
Tasks | |
Published | 2019-04-03 |
URL | http://arxiv.org/abs/1904.01722v1 |
http://arxiv.org/pdf/1904.01722v1.pdf | |
PWC | https://paperswithcode.com/paper/an-empirical-study-of-the-application-of |
Repo | |
Framework | |
What do Language Representations Really Represent?
Title | What do Language Representations Really Represent? |
Authors | Johannes Bjerva, Robert Östling, Maria Han Veiga, Jörg Tiedemann, Isabelle Augenstein |
Abstract | A neural language model trained on a text corpus can be used to induce distributed representations of words, such that similar words end up with similar representations. If the corpus is multilingual, the same model can be used to learn distributed representations of languages, such that similar languages end up with similar representations. We show that this holds even when the multilingual corpus has been translated into English, by picking up the faint signal left by the source languages. However, just like it is a thorny problem to separate semantic from syntactic similarity in word representations, it is not obvious what type of similarity is captured by language representations. We investigate correlations and causal relationships between language representations learned from translations on one hand, and genetic, geographical, and several levels of structural similarity between languages on the other. Of these, structural similarity is found to correlate most strongly with language representation similarity, while genetic relationships—a convenient benchmark used for evaluation in previous work—appears to be a confounding factor. Apart from implications about translation effects, we see this more generally as a case where NLP and linguistic typology can interact and benefit one another. |
Tasks | Language Modelling |
Published | 2019-01-09 |
URL | http://arxiv.org/abs/1901.02646v1 |
http://arxiv.org/pdf/1901.02646v1.pdf | |
PWC | https://paperswithcode.com/paper/what-do-language-representations-really |
Repo | |
Framework | |
Toroidal AutoEncoder
Title | Toroidal AutoEncoder |
Authors | Maciej Mikulski, Jaroslaw Duda |
Abstract | Enforcing distributions of latent variables in neural networks is an active subject. It is vital in all kinds of generative models, where we want to be able to interpolate between points in the latent space, or sample from it. Modern generative AutoEncoders (AE) like WAE, SWAE, CWAE add a regularizer to the standard (deterministic) AE, which allows to enforce Gaussian distribution in the latent space. Enforcing different distributions, especially topologically nontrivial, might bring some new interesting possibilities, but this subject seems unexplored so far. This article proposes a new approach to enforce uniform distribution on d-dimensional torus. We introduce a circular spring loss, which enforces minibatch points to be equally spaced and satisfy cyclic boundary conditions. As example of application we propose multiple-path morphing. Minimal distance geodesic between two points in uniform distribution on latent space of angles becomes a line, however, torus topology allows us to choose such lines in alternative ways, going through different edges of $[-\pi,\pi]^d$. Further applications to explore can be for example trying to learn real-life topologically nontrivial spaces of features, like rotations to automatically recognize 2D rotation of an object in picture by training on relative angles, or even 3D rotations by additionally using spherical features - this way morphing should be close to object rotation. |
Tasks | |
Published | 2019-03-28 |
URL | http://arxiv.org/abs/1903.12286v1 |
http://arxiv.org/pdf/1903.12286v1.pdf | |
PWC | https://paperswithcode.com/paper/toroidal-autoencoder |
Repo | |
Framework | |
FRI – Feature Relevance Intervals for Interpretable and Interactive Data Exploration
Title | FRI – Feature Relevance Intervals for Interpretable and Interactive Data Exploration |
Authors | Lukas Pfannschmidt, Christina Göpfert, Ursula Neumann, Dominik Heider, Barbara Hammer |
Abstract | Most existing feature selection methods are insufficient for analytic purposes as soon as high dimensional data or redundant sensor signals are dealt with since features can be selected due to spurious effects or correlations rather than causal effects. To support the finding of causal features in biomedical experiments, we hereby present FRI, an open source Python library that can be used to identify all-relevant variables in linear classification and (ordinal) regression problems. Using the recently proposed feature relevance method, FRI is able to provide the base for further general experimentation or in specific can facilitate the search for alternative biomarkers. It can be used in an interactive context, by providing model manipulation and visualization methods, or in a batch process as a filter method. |
Tasks | Feature Selection |
Published | 2019-03-02 |
URL | https://arxiv.org/abs/1903.00719v3 |
https://arxiv.org/pdf/1903.00719v3.pdf | |
PWC | https://paperswithcode.com/paper/fri-feature-relevance-intervals-for |
Repo | |
Framework | |
Dense Classification and Implanting for Few-Shot Learning
Title | Dense Classification and Implanting for Few-Shot Learning |
Authors | Yann Lifchitz, Yannis Avrithis, Sylvaine Picard, Andrei Bursuc |
Abstract | Training deep neural networks from few examples is a highly challenging and key problem for many computer vision tasks. In this context, we are targeting knowledge transfer from a set with abundant data to other sets with few available examples. We propose two simple and effective solutions: (i) dense classification over feature maps, which for the first time studies local activations in the domain of few-shot learning, and (ii) implanting, that is, attaching new neurons to a previously trained network to learn new, task-specific features. On miniImageNet, we improve the prior state-of-the-art on few-shot classification, i.e., we achieve 62.5%, 79.8% and 83.8% on 5-way 1-shot, 5-shot and 10-shot settings respectively. |
Tasks | Few-Shot Learning, Transfer Learning |
Published | 2019-03-12 |
URL | http://arxiv.org/abs/1903.05050v1 |
http://arxiv.org/pdf/1903.05050v1.pdf | |
PWC | https://paperswithcode.com/paper/dense-classification-and-implanting-for-few |
Repo | |
Framework | |
A Conformance Checking-based Approach for Drift Detection in Business Processes
Title | A Conformance Checking-based Approach for Drift Detection in Business Processes |
Authors | Víctor Gallego-Fontenla, Juan C. Vidal, Manuel Lama |
Abstract | Real life business processes change over time, in both planned and unexpected ways. The detection of these changes is crucial for organizations to ensure that the expected and the real behavior are as similar as possible. These changes over time are called concept drift and its detection is a big challenge in process mining since the inherent complexity of the data makes difficult distinguishing between a change and an anomalous execution. In this paper, we present C2D2 (Conformance Checking-based Drift Detection), a new approach to detect sudden control-flow changes in the process models from event traces. C2D2 combines discovery techniques with conformance checking methods to perform an offline detection. Our approach has been validated with a synthetic benchmarking dataset formed by 68 logs, showing an improvement in the accuracy while maintaining a minimum delay in the drift detection. |
Tasks | |
Published | 2019-07-09 |
URL | https://arxiv.org/abs/1907.04276v1 |
https://arxiv.org/pdf/1907.04276v1.pdf | |
PWC | https://paperswithcode.com/paper/a-conformance-checking-based-approach-for |
Repo | |
Framework | |
Markov chains in random environment with applications in queueing theory and machine learning
Title | Markov chains in random environment with applications in queueing theory and machine learning |
Authors | Attila Lovas, Miklós Rásonyi |
Abstract | We prove the existence of limiting distributions for a large class of Markov chains on a general state space in a random environment. We assume suitable versions of the standard drift and minorization conditions. In particular, the system dynamics should be contractive on the average with respect to the Lyapunov function and large enough small sets should exist with large enough minorization constants. We also establish that a law of large numbers holds for bounded functionals of the process. Applications to queuing systems and to machine learning algorithms are presented. |
Tasks | |
Published | 2019-11-11 |
URL | https://arxiv.org/abs/1911.04377v2 |
https://arxiv.org/pdf/1911.04377v2.pdf | |
PWC | https://paperswithcode.com/paper/markov-chains-in-random-environment-with |
Repo | |
Framework | |