October 16, 2019

3038 words 15 mins read

Paper Group ANR 1159

Gradient Descent Quantizes ReLU Network Features. Reduced-Order Modeling through Machine Learning Approaches for Brittle Fracture Applications. On Euclidean $k$-Means Clustering with $α$-Center Proximity. Concept-Based Embeddings for Natural Language Processing. Domain Attentive Fusion for End-to-end Dialect Identification with Unknown Target Domai …

Gradient Descent Quantizes ReLU Network Features


Title	Gradient Descent Quantizes ReLU Network Features
Authors	Hartmut Maennel, Olivier Bousquet, Sylvain Gelly
Abstract	Deep neural networks are often trained in the over-parametrized regime (i.e. with far more parameters than training examples), and understanding why the training converges to solutions that generalize remains an open problem. Several studies have highlighted the fact that the training procedure, i.e. mini-batch Stochastic Gradient Descent (SGD) leads to solutions that have specific properties in the loss landscape. However, even with plain Gradient Descent (GD) the solutions found in the over-parametrized regime are pretty good and this phenomenon is poorly understood. We propose an analysis of this behavior for feedforward networks with a ReLU activation function under the assumption of small initialization and learning rate and uncover a quantization effect: The weight vectors tend to concentrate at a small number of directions determined by the input data. As a consequence, we show that for given input data there are only finitely many, “simple” functions that can be obtained, independent of the network size. This puts these functions in analogy to linear interpolations (for given input data there are finitely many triangulations, which each determine a function by linear interpolation). We ask whether this analogy extends to the generalization properties - while the usual distribution-independent generalization property does not hold, it could be that for e.g. smooth functions with bounded second derivative an approximation property holds which could “explain” generalization of networks (of unbounded size) to unseen inputs.
Tasks	Quantization
Published	2018-03-22
URL	http://arxiv.org/abs/1803.08367v1
PDF	http://arxiv.org/pdf/1803.08367v1.pdf
PWC	https://paperswithcode.com/paper/gradient-descent-quantizes-relu-network
Repo
Framework

Reduced-Order Modeling through Machine Learning Approaches for Brittle Fracture Applications


Title	Reduced-Order Modeling through Machine Learning Approaches for Brittle Fracture Applications
Authors	A. Hunter, B. A. Moore, M. K. Mudunuru, V. T. Chau, R. L. Miller, R. B. Tchoua, C. Nyshadham, S. Karra, D. O. Malley, E. Rougier, H. S. Viswanathan, G. Srinivasan
Abstract	In this paper, five different approaches for reduced-order modeling of brittle fracture in geomaterials, specifically concrete, are presented and compared. Four of the five methods rely on machine learning (ML) algorithms to approximate important aspects of the brittle fracture problem. In addition to the ML algorithms, each method incorporates different physics-based assumptions in order to reduce the computational complexity while maintaining the physics as much as possible. This work specifically focuses on using the ML approaches to model a 2D concrete sample under low strain rate pure tensile loading conditions with 20 preexisting cracks present. A high-fidelity finite element-discrete element model is used to both produce a training dataset of 150 simulations and an additional 35 simulations for validation. Results from the ML approaches are directly compared against the results from the high-fidelity model. Strengths and weaknesses of each approach are discussed and the most important conclusion is that a combination of physics-informed and data-driven features are necessary for emulating the physics of crack propagation, interaction and coalescence. All of the models presented here have runtimes that are orders of magnitude faster than the original high-fidelity model and pave the path for developing accurate reduced order models that could be used to inform larger length-scale models with important sub-scale physics that often cannot be accounted for due to computational cost.
Tasks
Published	2018-06-05
URL	http://arxiv.org/abs/1806.01949v1
PDF	http://arxiv.org/pdf/1806.01949v1.pdf
PWC	https://paperswithcode.com/paper/reduced-order-modeling-through-machine
Repo
Framework

On Euclidean $k$-Means Clustering with $α$-Center Proximity


Title	On Euclidean $k$-Means Clustering with $α$-Center Proximity
Authors	Amit Deshpande, Anand Louis, Apoorv Vikram Singh
Abstract	$k$-means clustering is NP-hard in the worst case but previous work has shown efficient algorithms assuming the optimal $k$-means clusters are \emph{stable} under additive or multiplicative perturbation of data. This has two caveats. First, we do not know how to efficiently verify this property of optimal solutions that are NP-hard to compute in the first place. Second, the stability assumptions required for polynomial time $k$-means algorithms are often unreasonable when compared to the ground-truth clusters in real-world data. A consequence of multiplicative perturbation resilience is \emph{center proximity}, that is, every point is closer to the center of its own cluster than the center of any other cluster, by some multiplicative factor $\alpha > 1$. We study the problem of minimizing the Euclidean $k$-means objective only over clusterings that satisfy $\alpha$-center proximity. We give a simple algorithm to find the optimal $\alpha$-center-proximal $k$-means clustering in running time exponential in $k$ and $1/(\alpha - 1)$ but linear in the number of points and the dimension. We define an analogous $\alpha$-center proximity condition for outliers, and give similar algorithmic guarantees for $k$-means with outliers and $\alpha$-center proximity. On the hardness side we show that for any $\alpha’ > 1$, there exists an $\alpha \leq \alpha'$, $(\alpha >1)$, and an $\varepsilon_0 > 0$ such that minimizing the $k$-means objective over clusterings that satisfy $\alpha$-center proximity is NP-hard to approximate within a multiplicative $(1+\varepsilon_0)$ factor.
Tasks
Published	2018-04-28
URL	http://arxiv.org/abs/1804.10827v3
PDF	http://arxiv.org/pdf/1804.10827v3.pdf
PWC	https://paperswithcode.com/paper/on-euclidean-k-means-clustering-with-center
Repo
Framework

Concept-Based Embeddings for Natural Language Processing


Title	Concept-Based Embeddings for Natural Language Processing
Authors	Yukun Ma, Erik Cambria
Abstract	In this work, we focus on effectively leveraging and integrating information from concept-level as well as word-level via projecting concepts and words into a lower dimensional space while retaining most critical semantics. In a broad context of opinion understanding system, we investigate the use of the fused embedding for several core NLP tasks: named entity detection and classification, automatic speech recognition reranking, and targeted sentiment analysis.
Tasks	Sentiment Analysis, Speech Recognition
Published	2018-07-15
URL	http://arxiv.org/abs/1807.05519v1
PDF	http://arxiv.org/pdf/1807.05519v1.pdf
PWC	https://paperswithcode.com/paper/concept-based-embeddings-for-natural-language
Repo
Framework

Domain Attentive Fusion for End-to-end Dialect Identification with Unknown Target Domain


Title	Domain Attentive Fusion for End-to-end Dialect Identification with Unknown Target Domain
Authors	Suwon Shon, Ahmed Ali, James Glass
Abstract	End-to-end deep learning language or dialect identification systems operate on the spectrogram or other acoustic feature and directly generate identification scores for each class. An important issue for end-to-end systems is to have some knowledge of the application domain, because the system can be vulnerable to use cases that were not seen in the training phase; such a scenario is often referred to as a domain mismatched condition. In general, we assume that there is enough variation in the training dataset to expose the system to multiple domains. In this work, we study how to best make use a training dataset in order to have maximum effectiveness on unknown target domains. Our goal is to process the input without any knowledge of the target domain while preserving robust performance on other domains as well. To accomplish this objective, we propose a domain attentive fusion approach for end-to-end dialect/language identification systems. To help with experimentation, we collect a dataset from three different domains, and create experimental protocols for a domain mismatched condition. The results of our proposed approach, which were tested on a variety of broadcast and YouTube data, shows significant performance gain compared to traditional approaches, even without any prior target domain information.
Tasks	Language Identification
Published	2018-12-04
URL	https://arxiv.org/abs/1812.01501v2
PDF	https://arxiv.org/pdf/1812.01501v2.pdf
PWC	https://paperswithcode.com/paper/domain-attentive-fusion-for-end-to-end
Repo
Framework

Learning Two-layer Neural Networks with Symmetric Inputs


Title	Learning Two-layer Neural Networks with Symmetric Inputs
Authors	Rong Ge, Rohith Kuditipudi, Zhize Li, Xiang Wang
Abstract	We give a new algorithm for learning a two-layer neural network under a general class of input distributions. Assuming there is a ground-truth two-layer network $$ y = A \sigma(Wx) + \xi, $$ where $A,W$ are weight matrices, $\xi$ represents noise, and the number of neurons in the hidden layer is no larger than the input or output, our algorithm is guaranteed to recover the parameters $A,W$ of the ground-truth network. The only requirement on the input $x$ is that it is symmetric, which still allows highly complicated and structured input. Our algorithm is based on the method-of-moments framework and extends several results in tensor decompositions. We use spectral algorithms to avoid the complicated non-convex optimization in learning neural networks. Experiments show that our algorithm can robustly learn the ground-truth neural network with a small number of samples for many symmetric input distributions.
Tasks
Published	2018-10-16
URL	http://arxiv.org/abs/1810.06793v2
PDF	http://arxiv.org/pdf/1810.06793v2.pdf
PWC	https://paperswithcode.com/paper/learning-two-layer-neural-networks-with
Repo
Framework

Unsupervised Sentence Compression using Denoising Auto-Encoders


Title	Unsupervised Sentence Compression using Denoising Auto-Encoders
Authors	Thibault Févry, Jason Phang
Abstract	In sentence compression, the task of shortening sentences while retaining the original meaning, models tend to be trained on large corpora containing pairs of verbose and compressed sentences. To remove the need for paired corpora, we emulate a summarization task and add noise to extend sentences and train a denoising auto-encoder to recover the original, constructing an end-to-end training regime without the need for any examples of compressed sentences. We conduct a human evaluation of our model on a standard text summarization dataset and show that it performs comparably to a supervised baseline based on grammatical correctness and retention of meaning. Despite being exposed to no target data, our unsupervised models learn to generate imperfect but reasonably readable sentence summaries. Although we underperform supervised models based on ROUGE scores, our models are competitive with a supervised baseline based on human evaluation for grammatical correctness and retention of meaning.
Tasks	Denoising, Sentence Compression, Text Summarization, Unsupervised Sentence Compression
Published	2018-09-07
URL	http://arxiv.org/abs/1809.02669v1
PDF	http://arxiv.org/pdf/1809.02669v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-sentence-compression-using
Repo
Framework

Knowledge Based Machine Reading Comprehension


Title	Knowledge Based Machine Reading Comprehension
Authors	Yibo Sun, Daya Guo, Duyu Tang, Nan Duan, Zhao Yan, Xiaocheng Feng, Bing Qin
Abstract	Machine reading comprehension (MRC) requires reasoning about both the knowledge involved in a document and knowledge about the world. However, existing datasets are typically dominated by questions that can be well solved by context matching, which fail to test this capability. To encourage the progress on knowledge-based reasoning in MRC, we present knowledge-based MRC in this paper, and build a new dataset consisting of 40,047 question-answer pairs. The annotation of this dataset is designed so that successfully answering the questions requires understanding and the knowledge involved in a document. We implement a framework consisting of both a question answering model and a question generation model, both of which take the knowledge extracted from the document as well as relevant facts from an external knowledge base such as Freebase/ProBase/Reverb/NELL. Results show that incorporating side information from external KB improves the accuracy of the baseline question answer system. We compare it with a standard MRC model BiDAF, and also provide the difficulty of the dataset and lay out remaining challenges.
Tasks	Machine Reading Comprehension, Question Answering, Question Generation, Reading Comprehension
Published	2018-09-12
URL	http://arxiv.org/abs/1809.04267v1
PDF	http://arxiv.org/pdf/1809.04267v1.pdf
PWC	https://paperswithcode.com/paper/knowledge-based-machine-reading-comprehension
Repo
Framework

Comparing Attention-based Convolutional and Recurrent Neural Networks: Success and Limitations in Machine Reading Comprehension


Title	Comparing Attention-based Convolutional and Recurrent Neural Networks: Success and Limitations in Machine Reading Comprehension
Authors	Matthias Blohm, Glorianna Jagfeld, Ekta Sood, Xiang Yu, Ngoc Thang Vu
Abstract	We propose a machine reading comprehension model based on the compare-aggregate framework with two-staged attention that achieves state-of-the-art results on the MovieQA question answering dataset. To investigate the limitations of our model as well as the behavioral difference between convolutional and recurrent neural networks, we generate adversarial examples to confuse the model and compare to human performance. Furthermore, we assess the generalizability of our model by analyzing its differences to human inference,
Tasks	Machine Reading Comprehension, Question Answering, Reading Comprehension
Published	2018-08-27
URL	http://arxiv.org/abs/1808.08744v1
PDF	http://arxiv.org/pdf/1808.08744v1.pdf
PWC	https://paperswithcode.com/paper/comparing-attention-based-convolutional-and
Repo
Framework

PaloBoost: An Overfitting-robust TreeBoost with Out-of-Bag Sample Regularization Techniques


Title	PaloBoost: An Overfitting-robust TreeBoost with Out-of-Bag Sample Regularization Techniques
Authors	Yubin Park, Joyce C. Ho
Abstract	Stochastic Gradient TreeBoost is often found in many winning solutions in public data science challenges. Unfortunately, the best performance requires extensive parameter tuning and can be prone to overfitting. We propose PaloBoost, a Stochastic Gradient TreeBoost model that uses novel regularization techniques to guard against overfitting and is robust to parameter settings. PaloBoost uses the under-utilized out-of-bag samples to perform gradient-aware pruning and estimate adaptive learning rates. Unlike other Stochastic Gradient TreeBoost models that use the out-of-bag samples to estimate test errors, PaloBoost treats the samples as a second batch of training samples to prune the trees and adjust the learning rates. As a result, PaloBoost can dynamically adjust tree depths and learning rates to achieve faster learning at the start and slower learning as the algorithm converges. We illustrate how these regularization techniques can be efficiently implemented and propose a new formula for calculating feature importance to reflect the node coverages and learning rates. Extensive experimental results on seven datasets demonstrate that PaloBoost is robust to overfitting, is less sensitivity to the parameters, and can also effectively identify meaningful features.
Tasks	Feature Importance
Published	2018-07-22
URL	http://arxiv.org/abs/1807.08383v1
PDF	http://arxiv.org/pdf/1807.08383v1.pdf
PWC	https://paperswithcode.com/paper/paloboost-an-overfitting-robust-treeboost
Repo
Framework

Robustifying Models Against Adversarial Attacks by Langevin Dynamics


Title	Robustifying Models Against Adversarial Attacks by Langevin Dynamics
Authors	Vignesh Srinivasan, Arturo Marban, Klaus-Robert Müller, Wojciech Samek, Shinichi Nakajima
Abstract	Adversarial attacks on deep learning models have compromised their performance considerably. As remedies, a lot of defense methods were proposed, which however, have been circumvented by newer attacking strategies. In the midst of this ensuing arms race, the problem of robustness against adversarial attacks still remains unsolved. This paper proposes a novel, simple yet effective defense strategy where adversarial samples are relaxed onto the underlying manifold of the (unknown) target class distribution. Specifically, our algorithm drives off-manifold adversarial samples towards high density regions of the data generating distribution of the target class by the Metroplis-adjusted Langevin algorithm (MALA) with perceptual boundary taken into account. Although the motivation is similar to projection methods, e.g., Defense-GAN, our algorithm, called MALA for DEfense (MALADE), is equipped with significant dispersion - projection is distributed broadly, and therefore any whitebox attack cannot accurately align the input so that the MALADE moves it to a targeted untrained spot where the model predicts a wrong label. In our experiments, MALADE exhibited state-of-the-art performance against various elaborate attacking strategies.
Tasks	Denoising
Published	2018-05-30
URL	https://arxiv.org/abs/1805.12017v2
PDF	https://arxiv.org/pdf/1805.12017v2.pdf
PWC	https://paperswithcode.com/paper/counterstrike-defending-deep-learning
Repo
Framework

Face Flashing: a Secure Liveness Detection Protocol based on Light Reflections


Title	Face Flashing: a Secure Liveness Detection Protocol based on Light Reflections
Authors	Di Tang, Zhe Zhou, Yinqian Zhang, Kehuan Zhang
Abstract	Face authentication systems are becoming increasingly prevalent, especially with the rapid development of Deep Learning technologies. However, human facial information is easy to be captured and reproduced, which makes face authentication systems vulnerable to various attacks. Liveness detection is an important defense technique to prevent such attacks, but existing solutions did not provide clear and strong security guarantees, especially in terms of time. To overcome these limitations, we propose a new liveness detection protocol called Face Flashing that significantly increases the bar for launching successful attacks on face authentication systems. By randomly flashing well-designed pictures on a screen and analyzing the reflected light, our protocol has leveraged physical characteristics of human faces: reflection processing at the speed of light, unique textual features, and uneven 3D shapes. Cooperating with working mechanism of the screen and digital cameras, our protocol is able to detect subtle traces left by an attacking process. To demonstrate the effectiveness of Face Flashing, we implemented a prototype and performed thorough evaluations with large data set collected from real-world scenarios. The results show that our Timing Verification can effectively detect the time gap between legitimate authentications and malicious cases. Our Face Verification can also differentiate 2D plane from 3D objects accurately. The overall accuracy of our liveness detection system is 98.8%, and its robustness was evaluated in different scenarios. In the worst case, our system’s accuracy decreased to a still-high 97.3%.
Tasks	Face Verification
Published	2018-01-06
URL	http://arxiv.org/abs/1801.01949v2
PDF	http://arxiv.org/pdf/1801.01949v2.pdf
PWC	https://paperswithcode.com/paper/face-flashing-a-secure-liveness-detection
Repo
Framework

Numerical Integration on Graphs: where to sample and how to weigh


Title	Numerical Integration on Graphs: where to sample and how to weigh
Authors	George C. Linderman, Stefan Steinerberger
Abstract	Let $G=(V,E,w)$ be a finite, connected graph with weighted edges. We are interested in the problem of finding a subset $W \subset V$ of vertices and weights $a_w$ such that $$ \frac{1}{V}\sum_{v \in V}^{}{f(v)} \sim \sum_{w \in W}{a_w f(w)}$$ for functions $f:V \rightarrow \mathbb{R}$ that are `smooth' with respect to the geometry of the graph. The main application are problems where $f$ is known to somehow depend on the underlying graph but is expensive to evaluate on even a single vertex. We prove an inequality showing that the integration problem can be rewritten as a geometric problem (`the optimal packing of heat balls’). We discuss how one would construct approximate solutions of the heat ball packing problem; numerical examples demonstrate the efficiency of the method.
Tasks
Published	2018-03-19
URL	http://arxiv.org/abs/1803.06989v1
PDF	http://arxiv.org/pdf/1803.06989v1.pdf
PWC	https://paperswithcode.com/paper/numerical-integration-on-graphs-where-to
Repo
Framework

Bias in Semantic and Discourse Interpretation


Title	Bias in Semantic and Discourse Interpretation
Authors	Nicholas Asher, Soumya Paul
Abstract	In this paper, we show how game-theoretic work on conversation combined with a theory of discourse structure provides a framework for studying interpretive bias. Interpretive bias is an essential feature of learning and understanding but also something that can be used to pervert or subvert the truth. The framework we develop here provides tools for understanding and analyzing the range of interpretive biases and the factors that contribute to them.
Tasks
Published	2018-06-29
URL	http://arxiv.org/abs/1806.11322v1
PDF	http://arxiv.org/pdf/1806.11322v1.pdf
PWC	https://paperswithcode.com/paper/bias-in-semantic-and-discourse-interpretation
Repo
Framework

Hierarchical Text Generation using an Outline


Title	Hierarchical Text Generation using an Outline
Authors	Mehdi Drissi, Olivia Watkins, Jugal Kalita
Abstract	Many challenges in natural language processing require generating text, including language translation, dialogue generation, and speech recognition. For all of these problems, text generation becomes more difficult as the text becomes longer. Current language models often struggle to keep track of coherence for long pieces of text. Here, we attempt to have the model construct and use an outline of the text it generates to keep it focused. We find that the usage of an outline improves perplexity. We do not find that using the outline improves human evaluation over a simpler baseline, revealing a discrepancy in perplexity and human perception. Similarly, hierarchical generation is not found to improve human evaluation scores.
Tasks	Dialogue Generation, Speech Recognition, Text Generation
Published	2018-10-20
URL	http://arxiv.org/abs/1810.08802v1
PDF	http://arxiv.org/pdf/1810.08802v1.pdf
PWC	https://paperswithcode.com/paper/hierarchical-text-generation-using-an-outline
Repo
Framework