April 2, 2020

# Paper Group ANR 318

Limits of Detecting Text Generated by Large-Scale Language Models. Multimodal Matching Transformer for Live Commenting. Exploiting Database Management Systems and Treewidth for Counting. Learning Contact-Rich Manipulation Tasks with Rigid Position-Controlled Robots: Learning to Force Control. Citation Text Generation. Ten Research Challenge Areas i …

#### Limits of Detecting Text Generated by Large-Scale Language Models

Title Limits of Detecting Text Generated by Large-Scale Language Models
Authors Lav R. Varshney, Nitish Shirish Keskar, Richard Socher
Abstract Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns. Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated. We show that error exponents for particular language models are bounded in terms of their perplexity, a standard measure of language generation performance. Under the assumption that human language is stationary and ergodic, the formulation is extended from considering specific language models to considering maximum likelihood language models, among the class of k-order Markov approximations; error probabilities are characterized. Some discussion of incorporating semantic side information is also given.
Published 2020-02-09
URL https://arxiv.org/abs/2002.03438v1
PDF https://arxiv.org/pdf/2002.03438v1.pdf
PWC https://paperswithcode.com/paper/limits-of-detecting-text-generated-by-large
Repo
Framework

#### Multimodal Matching Transformer for Live Commenting

Title Multimodal Matching Transformer for Live Commenting
Authors Chaoqun Duan, Lei Cui, Shuming Ma, Furu Wei, Conghui Zhu, Tiejun Zhao
Abstract Automatic live commenting aims to provide real-time comments on videos for viewers. It encourages users engagement on online video sites, and is also a good benchmark for video-to-text generation. Recent work on this task adopts encoder-decoder models to generate comments. However, these methods do not model the interaction between videos and comments explicitly, so they tend to generate popular comments that are often irrelevant to the videos. In this work, we aim to improve the relevance between live comments and videos by modeling the cross-modal interactions among different modalities. To this end, we propose a multimodal matching transformer to capture the relationships among comments, vision, and audio. The proposed model is based on the transformer framework and can iteratively learn the attention-aware representations for each modality. We evaluate the model on a publicly available live commenting dataset. Experiments show that the multimodal matching transformer model outperforms the state-of-the-art methods.
Published 2020-02-07
URL https://arxiv.org/abs/2002.02649v1
PDF https://arxiv.org/pdf/2002.02649v1.pdf
PWC https://paperswithcode.com/paper/multimodal-matching-transformer-for-live
Repo
Framework

#### Exploiting Database Management Systems and Treewidth for Counting

Title Exploiting Database Management Systems and Treewidth for Counting
Authors Johannes K. Fichte, Markus Hecher, Patrick Thier, Stefan Woltran
Abstract Bounded treewidth is one of the most cited combinatorial invariants, which was applied in the literature for solving several counting problems efficiently. A canonical counting problem is #SAT, which asks to count the satisfying assignments of a Boolean formula. Recent work shows that benchmarking instances for #SAT often have reasonably small treewidth. This paper deals with counting problems for instances of small treewidth. We introduce a general framework to solve counting questions based on state-of-the-art database management systems (DBMS). Our framework takes explicitly advantage of small treewidth by solving instances using dynamic programming (DP) on tree decompositions (TD). Therefore, we implement the concept of DP into a DBMS (PostgreSQL), since DP algorithms are already often given in terms of table manipulations in theory. This allows for elegant specifications of DP algorithms and the use of SQL to manipulate records and tables, which gives us a natural approach to bring DP algorithms into practice. To the best of our knowledge, we present the first approach to employ a DBMS for algorithms on TDs. A key advantage of our approach is that DBMS naturally allow to deal with huge tables with a limited amount of main memory (RAM), parallelization, as well as suspending computation.
Published 2020-01-13
URL https://arxiv.org/abs/2001.04191v1
PDF https://arxiv.org/pdf/2001.04191v1.pdf
PWC https://paperswithcode.com/paper/exploiting-database-management-systems-and
Repo
Framework

#### Learning Contact-Rich Manipulation Tasks with Rigid Position-Controlled Robots: Learning to Force Control

Title Learning Contact-Rich Manipulation Tasks with Rigid Position-Controlled Robots: Learning to Force Control
Authors Cristian Camilo Beltran-Hernandez, Damien Petit, Ixchel G. Ramirez-Alpizar, Takayuki Nishi, Shinichi Kikuchi, Takamitsu Matsubara, Kensuke Harada
Abstract To fully realize industrial automation, it is indispensable to give the robot manipulators the ability to adapt by themselves to their surroundings and to learn to handle novel manipulation tasks. Reinforcement Learning (RL) methods have been proven successful in solving manipulation tasks autonomously. However, RL is still not widely adopted on real robotic systems because working with real hardware entails additional challenges, especially when using rigid position-controlled manipulators. These challenges include the need for a robust controller to avoid undesired behavior, that risk damaging the robot and its environment, and constant supervision from a human operator. The main contributions of this work are, first, we propose a framework for safely training an RL agent on manipulation tasks using a rigid robot. Second, to enable a position-controlled manipulator to perform contact-rich manipulation tasks, we implemented two different force control schemes based on standard force feedback controllers; one is a modified hybrid position-force control, and the other one is an impedance control. Third, we empirically study both control schemes when used as the action representation of an RL agent. We evaluate the trade-off between control complexity and performance by comparing several versions of the control schemes, each with a different number of force control parameters. The proposed methods are validated both on simulation and a real robot, a UR3 e-series robotic arm when executing contact-rich manipulation tasks.
Published 2020-03-02
URL https://arxiv.org/abs/2003.00628v1
PDF https://arxiv.org/pdf/2003.00628v1.pdf
Repo
Framework

#### Citation Text Generation

Title Citation Text Generation
Authors Kelvin Luu, Rik Koncel-Kedziorski, Kyle Lo, Isabel Cachola, Noah A. Smith
Abstract We introduce the task of citation text generation: given a pair of scientific documents, explain their relationship in natural language text in the manner of a citation from one text to the other. This task encourages systems to learn rich relationships between scientific texts and to express them concretely in natural language. Models for citation text generation will require robust document understanding including the capacity to quickly adapt to new vocabulary and to reason about document content. We believe this challenging direction of research will benefit high-impact applications such as automatic literature review or scientific writing assistance systems. In this paper we establish the task of citation text generation with a standard evaluation corpus and explore several baseline models.
Published 2020-02-02
URL https://arxiv.org/abs/2002.00317v1
PDF https://arxiv.org/pdf/2002.00317v1.pdf
PWC https://paperswithcode.com/paper/citation-text-generation
Repo
Framework

#### Ten Research Challenge Areas in Data Science

Title Ten Research Challenge Areas in Data Science
Authors Jeannette M. Wing
Abstract Although data science builds on knowledge from computer science, mathematics, statistics, and other disciplines, data science is a unique field with many mysteries to unlock: challenging scientific questions and pressing questions of societal importance. This article starts with meta-questions about data science as a discipline and then elaborates on ten ideas for the basis of a research agenda for data science.
Published 2020-01-27
URL https://arxiv.org/abs/2002.05658v1
PDF https://arxiv.org/pdf/2002.05658v1.pdf
PWC https://paperswithcode.com/paper/ten-research-challenge-areas-in-data-science
Repo
Framework

#### GACEM: Generalized Autoregressive Cross Entropy Method for Multi-Modal Black Box Constraint Satisfaction

Title GACEM: Generalized Autoregressive Cross Entropy Method for Multi-Modal Black Box Constraint Satisfaction
Authors Kourosh Hakhamaneshi, Keertana Settaluri, Pieter Abbeel, Vladimir Stojanovic
Abstract In this work we present a new method of black-box optimization and constraint satisfaction. Existing algorithms that have attempted to solve this problem are unable to consider multiple modes, and are not able to adapt to changes in environment dynamics. To address these issues, we developed a modified Cross-Entropy Method (CEM) that uses a masked auto-regressive neural network for modeling uniform distributions over the solution space. We train the model using maximum entropy policy gradient methods from Reinforcement Learning. Our algorithm is able to express complicated solution spaces, thus allowing it to track a variety of different solution regions. We empirically compare our algorithm with variations of CEM, including one with a Gaussian prior with fixed variance, and demonstrate better performance in terms of: number of diverse solutions, better mode discovery in multi-modal problems, and better sample efficiency in certain cases.
Published 2020-02-17
URL https://arxiv.org/abs/2002.07236v1
PDF https://arxiv.org/pdf/2002.07236v1.pdf
PWC https://paperswithcode.com/paper/gacem-generalized-autoregressive-cross
Repo
Framework

#### Diversity, Density, and Homogeneity: Quantitative Characteristic Metrics for Text Collections

Title Diversity, Density, and Homogeneity: Quantitative Characteristic Metrics for Text Collections
Authors Yi-An Lai, Xuan Zhu, Yi Zhang, Mona Diab
Abstract Summarizing data samples by quantitative measures has a long history, with descriptive statistics being a case in point. However, as natural language processing methods flourish, there are still insufficient characteristic metrics to describe a collection of texts in terms of the words, sentences, or paragraphs they comprise. In this work, we propose metrics of diversity, density, and homogeneity that quantitatively measure the dispersion, sparsity, and uniformity of a text collection. We conduct a series of simulations to verify that each metric holds desired properties and resonates with human intuitions. Experiments on real-world datasets demonstrate that the proposed characteristic metrics are highly correlated with text classification performance of a renowned model, BERT, which could inspire future applications.
Published 2020-03-19
URL https://arxiv.org/abs/2003.08529v1
PDF https://arxiv.org/pdf/2003.08529v1.pdf
PWC https://paperswithcode.com/paper/diversity-density-and-homogeneity
Repo
Framework

#### Anchor & Transform: Learning Sparse Representations of Discrete Objects

Title Anchor & Transform: Learning Sparse Representations of Discrete Objects
Authors Paul Pu Liang, Manzil Zaheer, Yuan Wang, Amr Ahmed
Abstract Learning continuous representations of discrete objects such as text, users, and URLs lies at the heart of many applications including language and user modeling. When using discrete objects as input to neural networks, we often ignore the underlying structures (e.g. natural groupings and similarities) and embed the objects independently into individual vectors. As a result, existing methods do not scale to large vocabulary sizes. In this paper, we design a Bayesian nonparametric prior for embeddings that encourages sparsity and leverages natural groupings among objects. We derive an approximate inference algorithm based on Small Variance Asymptotics which yields a simple and natural algorithm for learning a small set of anchor embeddings and a sparse transformation matrix. We call our method Anchor & Transform (ANT) as the embeddings of discrete objects are a sparse linear combination of the anchors, weighted according to the transformation matrix. ANT is scalable, flexible, end-to-end trainable, and allows the user to incorporate domain knowledge about object relationships. On text classification and language modeling benchmarks, ANT demonstrates stronger performance with fewer parameters as compared to existing compression baselines.
Published 2020-03-18
URL https://arxiv.org/abs/2003.08197v1
PDF https://arxiv.org/pdf/2003.08197v1.pdf
PWC https://paperswithcode.com/paper/anchor-transform-learning-sparse-1
Repo
Framework

#### Gradient $\ell_1$ Regularization for Quantization Robustness

Title Gradient $\ell_1$ Regularization for Quantization Robustness
Authors Milad Alizadeh, Arash Behboodi, Mart van Baalen, Christos Louizos, Tijmen Blankevoort, Max Welling
Abstract We analyze the effect of quantizing weights and activations of neural networks on their loss and derive a simple regularization scheme that improves robustness against post-training quantization. By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths as energy and memory requirements of the application change. Unlike quantization-aware training using the straight-through estimator that only targets a specific bit-width and requires access to training data and pipeline, our regularization-based method paves the way for “on the fly’’ post-training quantization to various bit-widths. We show that by modeling quantization as a $\ell_\infty$-bounded perturbation, the first-order term in the loss expansion can be regularized using the $\ell_1$-norm of gradients. We experimentally validate the effectiveness of our regularization scheme on different architectures on CIFAR-10 and ImageNet datasets.
Published 2020-02-18
URL https://arxiv.org/abs/2002.07520v1
PDF https://arxiv.org/pdf/2002.07520v1.pdf
Repo
Framework

#### Sub-Gaussian Matrices on Sets: Optimal Tail Dependence and Applications

Title Sub-Gaussian Matrices on Sets: Optimal Tail Dependence and Applications
Authors Halyun Jeong, Xiaowei Li, Yaniv Plan, Özgür Yılmaz
Abstract Random linear mappings are widely used in modern signal processing, compressed sensing and machine learning. These mappings may be used to embed the data into a significantly lower dimension while at the same time preserving useful information. This is done by approximately preserving the distances between data points, which are assumed to belong to $\mathbb{R}^n$. Thus, the performance of these mappings is usually captured by how close they are to an isometry on the data. Random Gaussian linear mappings have been the object of much study, while the sub-Gaussian settings is not yet fully understood. In the latter case, the performance depends on the sub-Gaussian norm of the rows. In many applications, e.g., compressed sensing, this norm may be large, or even growing with dimension, and thus it is important to characterize this dependence. We study when a sub-Gaussian matrix can become a near isometry on a set, show that previous best known dependence on the sub-Gaussian norm was sub-optimal, and present the optimal dependence. Our result not only answers a remaining question posed by Liaw, Mehrabian, Plan and Vershynin in 2017, but also generalizes their work. We also develop a new Bernstein type inequality for sub-exponential random variables, and a new Hanson-Wright inequality for quadratic forms of sub-Gaussian random variables, in both cases improving the bounds in the sub-Gaussian regime under moment constraints. Finally, we illustrate popular applications such as Johnson-Lindenstrauss embeddings, randomized sketches and blind demodulation, whose theoretical guarantees can be improved by our results in the sub-Gaussian case.
Published 2020-01-28
URL https://arxiv.org/abs/2001.10631v1
PDF https://arxiv.org/pdf/2001.10631v1.pdf
PWC https://paperswithcode.com/paper/sub-gaussian-matrices-on-sets-optimal-tail
Repo
Framework

#### Data-Free Adversarial Perturbations for Practical Black-Box Attack

Title Data-Free Adversarial Perturbations for Practical Black-Box Attack
Authors ZhaoXin Huan, Yulong Wang, Xiaolu Zhang, Lin Shang, Chilin Fu, Jun Zhou
Published 2020-03-03
URL https://arxiv.org/abs/2003.01295v1
PDF https://arxiv.org/pdf/2003.01295v1.pdf
Repo
Framework

#### Adaptation of a deep learning malignancy model from full-field digital mammography to digital breast tomosynthesis

Title Adaptation of a deep learning malignancy model from full-field digital mammography to digital breast tomosynthesis
Authors Sadanand Singh, Thomas Paul Matthews, Meet Shah, Brent Mombourquette, Trevor Tsue, Aaron Long, Ranya Almohsen, Stefano Pedemonte, Jason Su
Abstract Mammography-based screening has helped reduce the breast cancer mortality rate, but has also been associated with potential harms due to low specificity, leading to unnecessary exams or procedures, and low sensitivity. Digital breast tomosynthesis (DBT) improves on conventional mammography by increasing both sensitivity and specificity and is becoming common in clinical settings. However, deep learning (DL) models have been developed mainly on conventional 2D full-field digital mammography (FFDM) or scanned film images. Due to a lack of large annotated DBT datasets, it is difficult to train a model on DBT from scratch. In this work, we present methods to generalize a model trained on FFDM images to DBT images. In particular, we use average histogram matching (HM) and DL fine-tuning methods to generalize a FFDM model to the 2D maximum intensity projection (MIP) of DBT images. In the proposed approach, the differences between the FFDM and DBT domains are reduced via HM and then the base model, which was trained on abundant FFDM images, is fine-tuned. When evaluating on image patches extracted around identified findings, we are able to achieve similar areas under the receiver operating characteristic curve (ROC AUC) of $\sim 0.9$ for FFDM and $\sim 0.85$ for MIP images, as compared to a ROC AUC of $\sim 0.75$ when tested directly on MIP images.
Published 2020-01-23
URL https://arxiv.org/abs/2001.08381v1
PDF https://arxiv.org/pdf/2001.08381v1.pdf
Repo
Framework

#### Understanding and Improving Knowledge Distillation

Title Understanding and Improving Knowledge Distillation
Authors Jiaxi Tang, Rakesh Shivanna, Zhe Zhao, Dong Lin, Anima Singh, Ed H. Chi, Sagar Jain
Abstract Knowledge distillation is a model-agnostic technique to improve model quality while having a fixed capacity budget. It is a commonly used technique for model compression, where a higher capacity teacher model with better quality is used to train a more compact student model with better inference efficiency. Through distillation, one hopes to benefit from student’s compactness, without sacrificing too much on model quality. Despite the large success of knowledge distillation, better understanding of how it benefits student model’s training dynamics remains under-explored. In this paper, we dissect the effects of knowledge distillation into three main factors: (1) benefits inherited from label smoothing, (2) example re-weighting based on teacher’s confidence on ground-truth, and (3) prior knowledge of optimal output (logit) layer geometry. Using extensive systematic analyses and empirical studies on synthetic and real-world datasets, we confirm that the aforementioned three factors play a major role in knowledge distillation. Furthermore, based on our findings, we propose a simple, yet effective technique to improve knowledge distillation empirically.
Published 2020-02-10
URL https://arxiv.org/abs/2002.03532v1
PDF https://arxiv.org/pdf/2002.03532v1.pdf
PWC https://paperswithcode.com/paper/understanding-and-improving-knowledge
Repo
Framework

#### The Data Science Fire Next Time: Innovative strategies for mentoring in data science

Title The Data Science Fire Next Time: Innovative strategies for mentoring in data science
Authors Latifa Jackson, Heriberto Acosta Maestre
Abstract As data mining research and applications continue to expand in to a variety of fields such as medicine, finance, security, etc., the need for talented and diverse individuals is clearly felt. This is particularly the case as Big Data initiatives have taken off in the federal, private and academic sectors, providing a wealth of opportunities, nationally and internationally. The Broadening Participation in Data Mining (BPDM) workshop was created more than 7 years ago with the goal of fostering mentorship, guidance, and connections for minority and underrepresented groups in the data science and machine learning community, while also enriching technical aptitude and exposure for a group of talented students. To date it has impacted the lives of more than 330 underrepresented trainees in data science. We provide a venue to connect talented students with innovative researchers in industry, academia, professional societies, and government. Our mission is to facilitate meaningful, lasting relationships between BPDM participants to ultimately increase diversity in data mining. This most recent workshop took place at Howard University in Washington, DC in February 2019. Here we report on the mentoring strategies that we undertook at the 2019 BPDM and how those were received.