Paper Group ANR 1159
Gradient Descent Quantizes ReLU Network Features. Reduced-Order Modeling through Machine Learning Approaches for Brittle Fracture Applications. On Euclidean $k$-Means Clustering with $α$-Center Proximity. Concept-Based Embeddings for Natural Language Processing. Domain Attentive Fusion for End-to-end Dialect Identification with Unknown Target Domai …
Gradient Descent Quantizes ReLU Network Features
Title | Gradient Descent Quantizes ReLU Network Features |
Authors | Hartmut Maennel, Olivier Bousquet, Sylvain Gelly |
Abstract | Deep neural networks are often trained in the over-parametrized regime (i.e. with far more parameters than training examples), and understanding why the training converges to solutions that generalize remains an open problem. Several studies have highlighted the fact that the training procedure, i.e. mini-batch Stochastic Gradient Descent (SGD) leads to solutions that have specific properties in the loss landscape. However, even with plain Gradient Descent (GD) the solutions found in the over-parametrized regime are pretty good and this phenomenon is poorly understood. We propose an analysis of this behavior for feedforward networks with a ReLU activation function under the assumption of small initialization and learning rate and uncover a quantization effect: The weight vectors tend to concentrate at a small number of directions determined by the input data. As a consequence, we show that for given input data there are only finitely many, “simple” functions that can be obtained, independent of the network size. This puts these functions in analogy to linear interpolations (for given input data there are finitely many triangulations, which each determine a function by linear interpolation). We ask whether this analogy extends to the generalization properties - while the usual distribution-independent generalization property does not hold, it could be that for e.g. smooth functions with bounded second derivative an approximation property holds which could “explain” generalization of networks (of unbounded size) to unseen inputs. |
Tasks | Quantization |
Published | 2018-03-22 |
URL | http://arxiv.org/abs/1803.08367v1 |
http://arxiv.org/pdf/1803.08367v1.pdf | |
PWC | https://paperswithcode.com/paper/gradient-descent-quantizes-relu-network |
Repo | |
Framework | |
Reduced-Order Modeling through Machine Learning Approaches for Brittle Fracture Applications
Title | Reduced-Order Modeling through Machine Learning Approaches for Brittle Fracture Applications |
Authors | A. Hunter, B. A. Moore, M. K. Mudunuru, V. T. Chau, R. L. Miller, R. B. Tchoua, C. Nyshadham, S. Karra, D. O. Malley, E. Rougier, H. S. Viswanathan, G. Srinivasan |
Abstract | In this paper, five different approaches for reduced-order modeling of brittle fracture in geomaterials, specifically concrete, are presented and compared. Four of the five methods rely on machine learning (ML) algorithms to approximate important aspects of the brittle fracture problem. In addition to the ML algorithms, each method incorporates different physics-based assumptions in order to reduce the computational complexity while maintaining the physics as much as possible. This work specifically focuses on using the ML approaches to model a 2D concrete sample under low strain rate pure tensile loading conditions with 20 preexisting cracks present. A high-fidelity finite element-discrete element model is used to both produce a training dataset of 150 simulations and an additional 35 simulations for validation. Results from the ML approaches are directly compared against the results from the high-fidelity model. Strengths and weaknesses of each approach are discussed and the most important conclusion is that a combination of physics-informed and data-driven features are necessary for emulating the physics of crack propagation, interaction and coalescence. All of the models presented here have runtimes that are orders of magnitude faster than the original high-fidelity model and pave the path for developing accurate reduced order models that could be used to inform larger length-scale models with important sub-scale physics that often cannot be accounted for due to computational cost. |
Tasks | |
Published | 2018-06-05 |
URL | http://arxiv.org/abs/1806.01949v1 |
http://arxiv.org/pdf/1806.01949v1.pdf | |
PWC | https://paperswithcode.com/paper/reduced-order-modeling-through-machine |
Repo | |
Framework | |
On Euclidean $k$-Means Clustering with $α$-Center Proximity
Title | On Euclidean $k$-Means Clustering with $α$-Center Proximity |
Authors | Amit Deshpande, Anand Louis, Apoorv Vikram Singh |
Abstract | $k$-means clustering is NP-hard in the worst case but previous work has shown efficient algorithms assuming the optimal $k$-means clusters are \emph{stable} under additive or multiplicative perturbation of data. This has two caveats. First, we do not know how to efficiently verify this property of optimal solutions that are NP-hard to compute in the first place. Second, the stability assumptions required for polynomial time $k$-means algorithms are often unreasonable when compared to the ground-truth clusters in real-world data. A consequence of multiplicative perturbation resilience is \emph{center proximity}, that is, every point is closer to the center of its own cluster than the center of any other cluster, by some multiplicative factor $\alpha > 1$. We study the problem of minimizing the Euclidean $k$-means objective only over clusterings that satisfy $\alpha$-center proximity. We give a simple algorithm to find the optimal $\alpha$-center-proximal $k$-means clustering in running time exponential in $k$ and $1/(\alpha - 1)$ but linear in the number of points and the dimension. We define an analogous $\alpha$-center proximity condition for outliers, and give similar algorithmic guarantees for $k$-means with outliers and $\alpha$-center proximity. On the hardness side we show that for any $\alpha’ > 1$, there exists an $\alpha \leq \alpha'$, $(\alpha >1)$, and an $\varepsilon_0 > 0$ such that minimizing the $k$-means objective over clusterings that satisfy $\alpha$-center proximity is NP-hard to approximate within a multiplicative $(1+\varepsilon_0)$ factor. |
Tasks | |
Published | 2018-04-28 |
URL | http://arxiv.org/abs/1804.10827v3 |
http://arxiv.org/pdf/1804.10827v3.pdf | |
PWC | https://paperswithcode.com/paper/on-euclidean-k-means-clustering-with-center |
Repo | |
Framework | |
Concept-Based Embeddings for Natural Language Processing
Title | Concept-Based Embeddings for Natural Language Processing |
Authors | Yukun Ma, Erik Cambria |
Abstract | In this work, we focus on effectively leveraging and integrating information from concept-level as well as word-level via projecting concepts and words into a lower dimensional space while retaining most critical semantics. In a broad context of opinion understanding system, we investigate the use of the fused embedding for several core NLP tasks: named entity detection and classification, automatic speech recognition reranking, and targeted sentiment analysis. |
Tasks | Sentiment Analysis, Speech Recognition |
Published | 2018-07-15 |
URL | http://arxiv.org/abs/1807.05519v1 |
http://arxiv.org/pdf/1807.05519v1.pdf | |
PWC | https://paperswithcode.com/paper/concept-based-embeddings-for-natural-language |
Repo | |
Framework | |
Domain Attentive Fusion for End-to-end Dialect Identification with Unknown Target Domain
Title | Domain Attentive Fusion for End-to-end Dialect Identification with Unknown Target Domain |
Authors | Suwon Shon, Ahmed Ali, James Glass |
Abstract | End-to-end deep learning language or dialect identification systems operate on the spectrogram or other acoustic feature and directly generate identification scores for each class. An important issue for end-to-end systems is to have some knowledge of the application domain, because the system can be vulnerable to use cases that were not seen in the training phase; such a scenario is often referred to as a domain mismatched condition. In general, we assume that there is enough variation in the training dataset to expose the system to multiple domains. In this work, we study how to best make use a training dataset in order to have maximum effectiveness on unknown target domains. Our goal is to process the input without any knowledge of the target domain while preserving robust performance on other domains as well. To accomplish this objective, we propose a domain attentive fusion approach for end-to-end dialect/language identification systems. To help with experimentation, we collect a dataset from three different domains, and create experimental protocols for a domain mismatched condition. The results of our proposed approach, which were tested on a variety of broadcast and YouTube data, shows significant performance gain compared to traditional approaches, even without any prior target domain information. |
Tasks | Language Identification |
Published | 2018-12-04 |
URL | https://arxiv.org/abs/1812.01501v2 |
https://arxiv.org/pdf/1812.01501v2.pdf | |
PWC | https://paperswithcode.com/paper/domain-attentive-fusion-for-end-to-end |
Repo | |
Framework | |
Learning Two-layer Neural Networks with Symmetric Inputs
Title | Learning Two-layer Neural Networks with Symmetric Inputs |
Authors | Rong Ge, Rohith Kuditipudi, Zhize Li, Xiang Wang |
Abstract | We give a new algorithm for learning a two-layer neural network under a general class of input distributions. Assuming there is a ground-truth two-layer network $$ y = A \sigma(Wx) + \xi, $$ where $A,W$ are weight matrices, $\xi$ represents noise, and the number of neurons in the hidden layer is no larger than the input or output, our algorithm is guaranteed to recover the parameters $A,W$ of the ground-truth network. The only requirement on the input $x$ is that it is symmetric, which still allows highly complicated and structured input. Our algorithm is based on the method-of-moments framework and extends several results in tensor decompositions. We use spectral algorithms to avoid the complicated non-convex optimization in learning neural networks. Experiments show that our algorithm can robustly learn the ground-truth neural network with a small number of samples for many symmetric input distributions. |
Tasks | |
Published | 2018-10-16 |
URL | http://arxiv.org/abs/1810.06793v2 |
http://arxiv.org/pdf/1810.06793v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-two-layer-neural-networks-with |
Repo | |
Framework | |
Unsupervised Sentence Compression using Denoising Auto-Encoders
Title | Unsupervised Sentence Compression using Denoising Auto-Encoders |
Authors | Thibault Févry, Jason Phang |
Abstract | In sentence compression, the task of shortening sentences while retaining the original meaning, models tend to be trained on large corpora containing pairs of verbose and compressed sentences. To remove the need for paired corpora, we emulate a summarization task and add noise to extend sentences and train a denoising auto-encoder to recover the original, constructing an end-to-end training regime without the need for any examples of compressed sentences. We conduct a human evaluation of our model on a standard text summarization dataset and show that it performs comparably to a supervised baseline based on grammatical correctness and retention of meaning. Despite being exposed to no target data, our unsupervised models learn to generate imperfect but reasonably readable sentence summaries. Although we underperform supervised models based on ROUGE scores, our models are competitive with a supervised baseline based on human evaluation for grammatical correctness and retention of meaning. |
Tasks | Denoising, Sentence Compression, Text Summarization, Unsupervised Sentence Compression |
Published | 2018-09-07 |
URL | http://arxiv.org/abs/1809.02669v1 |
http://arxiv.org/pdf/1809.02669v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-sentence-compression-using |
Repo | |
Framework | |
Knowledge Based Machine Reading Comprehension
Title | Knowledge Based Machine Reading Comprehension |
Authors | Yibo Sun, Daya Guo, Duyu Tang, Nan Duan, Zhao Yan, Xiaocheng Feng, Bing Qin |
Abstract | Machine reading comprehension (MRC) requires reasoning about both the knowledge involved in a document and knowledge about the world. However, existing datasets are typically dominated by questions that can be well solved by context matching, which fail to test this capability. To encourage the progress on knowledge-based reasoning in MRC, we present knowledge-based MRC in this paper, and build a new dataset consisting of 40,047 question-answer pairs. The annotation of this dataset is designed so that successfully answering the questions requires understanding and the knowledge involved in a document. We implement a framework consisting of both a question answering model and a question generation model, both of which take the knowledge extracted from the document as well as relevant facts from an external knowledge base such as Freebase/ProBase/Reverb/NELL. Results show that incorporating side information from external KB improves the accuracy of the baseline question answer system. We compare it with a standard MRC model BiDAF, and also provide the difficulty of the dataset and lay out remaining challenges. |
Tasks | Machine Reading Comprehension, Question Answering, Question Generation, Reading Comprehension |
Published | 2018-09-12 |
URL | http://arxiv.org/abs/1809.04267v1 |
http://arxiv.org/pdf/1809.04267v1.pdf | |
PWC | https://paperswithcode.com/paper/knowledge-based-machine-reading-comprehension |
Repo | |
Framework | |
Comparing Attention-based Convolutional and Recurrent Neural Networks: Success and Limitations in Machine Reading Comprehension
Title | Comparing Attention-based Convolutional and Recurrent Neural Networks: Success and Limitations in Machine Reading Comprehension |
Authors | Matthias Blohm, Glorianna Jagfeld, Ekta Sood, Xiang Yu, Ngoc Thang Vu |
Abstract | We propose a machine reading comprehension model based on the compare-aggregate framework with two-staged attention that achieves state-of-the-art results on the MovieQA question answering dataset. To investigate the limitations of our model as well as the behavioral difference between convolutional and recurrent neural networks, we generate adversarial examples to confuse the model and compare to human performance. Furthermore, we assess the generalizability of our model by analyzing its differences to human inference, |
Tasks | Machine Reading Comprehension, Question Answering, Reading Comprehension |
Published | 2018-08-27 |
URL | http://arxiv.org/abs/1808.08744v1 |
http://arxiv.org/pdf/1808.08744v1.pdf | |
PWC | https://paperswithcode.com/paper/comparing-attention-based-convolutional-and |
Repo | |
Framework | |
PaloBoost: An Overfitting-robust TreeBoost with Out-of-Bag Sample Regularization Techniques
Title | PaloBoost: An Overfitting-robust TreeBoost with Out-of-Bag Sample Regularization Techniques |
Authors | Yubin Park, Joyce C. Ho |
Abstract | Stochastic Gradient TreeBoost is often found in many winning solutions in public data science challenges. Unfortunately, the best performance requires extensive parameter tuning and can be prone to overfitting. We propose PaloBoost, a Stochastic Gradient TreeBoost model that uses novel regularization techniques to guard against overfitting and is robust to parameter settings. PaloBoost uses the under-utilized out-of-bag samples to perform gradient-aware pruning and estimate adaptive learning rates. Unlike other Stochastic Gradient TreeBoost models that use the out-of-bag samples to estimate test errors, PaloBoost treats the samples as a second batch of training samples to prune the trees and adjust the learning rates. As a result, PaloBoost can dynamically adjust tree depths and learning rates to achieve faster learning at the start and slower learning as the algorithm converges. We illustrate how these regularization techniques can be efficiently implemented and propose a new formula for calculating feature importance to reflect the node coverages and learning rates. Extensive experimental results on seven datasets demonstrate that PaloBoost is robust to overfitting, is less sensitivity to the parameters, and can also effectively identify meaningful features. |
Tasks | Feature Importance |
Published | 2018-07-22 |
URL | http://arxiv.org/abs/1807.08383v1 |
http://arxiv.org/pdf/1807.08383v1.pdf | |
PWC | https://paperswithcode.com/paper/paloboost-an-overfitting-robust-treeboost |
Repo | |
Framework | |
Robustifying Models Against Adversarial Attacks by Langevin Dynamics
Title | Robustifying Models Against Adversarial Attacks by Langevin Dynamics |
Authors | Vignesh Srinivasan, Arturo Marban, Klaus-Robert Müller, Wojciech Samek, Shinichi Nakajima |
Abstract | Adversarial attacks on deep learning models have compromised their performance considerably. As remedies, a lot of defense methods were proposed, which however, have been circumvented by newer attacking strategies. In the midst of this ensuing arms race, the problem of robustness against adversarial attacks still remains unsolved. This paper proposes a novel, simple yet effective defense strategy where adversarial samples are relaxed onto the underlying manifold of the (unknown) target class distribution. Specifically, our algorithm drives off-manifold adversarial samples towards high density regions of the data generating distribution of the target class by the Metroplis-adjusted Langevin algorithm (MALA) with perceptual boundary taken into account. Although the motivation is similar to projection methods, e.g., Defense-GAN, our algorithm, called MALA for DEfense (MALADE), is equipped with significant dispersion - projection is distributed broadly, and therefore any whitebox attack cannot accurately align the input so that the MALADE moves it to a targeted untrained spot where the model predicts a wrong label. In our experiments, MALADE exhibited state-of-the-art performance against various elaborate attacking strategies. |
Tasks | Denoising |
Published | 2018-05-30 |
URL | https://arxiv.org/abs/1805.12017v2 |
https://arxiv.org/pdf/1805.12017v2.pdf | |
PWC | https://paperswithcode.com/paper/counterstrike-defending-deep-learning |
Repo | |
Framework | |
Face Flashing: a Secure Liveness Detection Protocol based on Light Reflections
Title | Face Flashing: a Secure Liveness Detection Protocol based on Light Reflections |
Authors | Di Tang, Zhe Zhou, Yinqian Zhang, Kehuan Zhang |
Abstract | Face authentication systems are becoming increasingly prevalent, especially with the rapid development of Deep Learning technologies. However, human facial information is easy to be captured and reproduced, which makes face authentication systems vulnerable to various attacks. Liveness detection is an important defense technique to prevent such attacks, but existing solutions did not provide clear and strong security guarantees, especially in terms of time. To overcome these limitations, we propose a new liveness detection protocol called Face Flashing that significantly increases the bar for launching successful attacks on face authentication systems. By randomly flashing well-designed pictures on a screen and analyzing the reflected light, our protocol has leveraged physical characteristics of human faces: reflection processing at the speed of light, unique textual features, and uneven 3D shapes. Cooperating with working mechanism of the screen and digital cameras, our protocol is able to detect subtle traces left by an attacking process. To demonstrate the effectiveness of Face Flashing, we implemented a prototype and performed thorough evaluations with large data set collected from real-world scenarios. The results show that our Timing Verification can effectively detect the time gap between legitimate authentications and malicious cases. Our Face Verification can also differentiate 2D plane from 3D objects accurately. The overall accuracy of our liveness detection system is 98.8%, and its robustness was evaluated in different scenarios. In the worst case, our system’s accuracy decreased to a still-high 97.3%. |
Tasks | Face Verification |
Published | 2018-01-06 |
URL | http://arxiv.org/abs/1801.01949v2 |
http://arxiv.org/pdf/1801.01949v2.pdf | |
PWC | https://paperswithcode.com/paper/face-flashing-a-secure-liveness-detection |
Repo | |
Framework | |
Numerical Integration on Graphs: where to sample and how to weigh
Title | Numerical Integration on Graphs: where to sample and how to weigh |
Authors | George C. Linderman, Stefan Steinerberger |
Abstract | Let $G=(V,E,w)$ be a finite, connected graph with weighted edges. We are interested in the problem of finding a subset $W \subset V$ of vertices and weights $a_w$ such that $$ \frac{1}{V}\sum_{v \in V}^{}{f(v)} \sim \sum_{w \in W}{a_w f(w)}$$ for functions $f:V \rightarrow \mathbb{R}$ that are smooth' with respect to the geometry of the graph. The main application are problems where $f$ is known to somehow depend on the underlying graph but is expensive to evaluate on even a single vertex. We prove an inequality showing that the integration problem can be rewritten as a geometric problem ( the optimal packing of heat balls’). We discuss how one would construct approximate solutions of the heat ball packing problem; numerical examples demonstrate the efficiency of the method. |
Tasks | |
Published | 2018-03-19 |
URL | http://arxiv.org/abs/1803.06989v1 |
http://arxiv.org/pdf/1803.06989v1.pdf | |
PWC | https://paperswithcode.com/paper/numerical-integration-on-graphs-where-to |
Repo | |
Framework | |
Bias in Semantic and Discourse Interpretation
Title | Bias in Semantic and Discourse Interpretation |
Authors | Nicholas Asher, Soumya Paul |
Abstract | In this paper, we show how game-theoretic work on conversation combined with a theory of discourse structure provides a framework for studying interpretive bias. Interpretive bias is an essential feature of learning and understanding but also something that can be used to pervert or subvert the truth. The framework we develop here provides tools for understanding and analyzing the range of interpretive biases and the factors that contribute to them. |
Tasks | |
Published | 2018-06-29 |
URL | http://arxiv.org/abs/1806.11322v1 |
http://arxiv.org/pdf/1806.11322v1.pdf | |
PWC | https://paperswithcode.com/paper/bias-in-semantic-and-discourse-interpretation |
Repo | |
Framework | |
Hierarchical Text Generation using an Outline
Title | Hierarchical Text Generation using an Outline |
Authors | Mehdi Drissi, Olivia Watkins, Jugal Kalita |
Abstract | Many challenges in natural language processing require generating text, including language translation, dialogue generation, and speech recognition. For all of these problems, text generation becomes more difficult as the text becomes longer. Current language models often struggle to keep track of coherence for long pieces of text. Here, we attempt to have the model construct and use an outline of the text it generates to keep it focused. We find that the usage of an outline improves perplexity. We do not find that using the outline improves human evaluation over a simpler baseline, revealing a discrepancy in perplexity and human perception. Similarly, hierarchical generation is not found to improve human evaluation scores. |
Tasks | Dialogue Generation, Speech Recognition, Text Generation |
Published | 2018-10-20 |
URL | http://arxiv.org/abs/1810.08802v1 |
http://arxiv.org/pdf/1810.08802v1.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-text-generation-using-an-outline |
Repo | |
Framework | |