April 2, 2020

3122 words 15 mins read

Paper Group ANR 318

Limits of Detecting Text Generated by Large-Scale Language Models. Multimodal Matching Transformer for Live Commenting. Exploiting Database Management Systems and Treewidth for Counting. Learning Contact-Rich Manipulation Tasks with Rigid Position-Controlled Robots: Learning to Force Control. Citation Text Generation. Ten Research Challenge Areas i …

Limits of Detecting Text Generated by Large-Scale Language Models


Title	Limits of Detecting Text Generated by Large-Scale Language Models
Authors	Lav R. Varshney, Nitish Shirish Keskar, Richard Socher
Abstract	Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns. Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated. We show that error exponents for particular language models are bounded in terms of their perplexity, a standard measure of language generation performance. Under the assumption that human language is stationary and ergodic, the formulation is extended from considering specific language models to considering maximum likelihood language models, among the class of k-order Markov approximations; error probabilities are characterized. Some discussion of incorporating semantic side information is also given.
Tasks	Language Modelling, Text Generation
Published	2020-02-09
URL	https://arxiv.org/abs/2002.03438v1
PDF	https://arxiv.org/pdf/2002.03438v1.pdf
PWC	https://paperswithcode.com/paper/limits-of-detecting-text-generated-by-large
Repo
Framework

Multimodal Matching Transformer for Live Commenting


Title	Multimodal Matching Transformer for Live Commenting
Authors	Chaoqun Duan, Lei Cui, Shuming Ma, Furu Wei, Conghui Zhu, Tiejun Zhao
Abstract	Automatic live commenting aims to provide real-time comments on videos for viewers. It encourages users engagement on online video sites, and is also a good benchmark for video-to-text generation. Recent work on this task adopts encoder-decoder models to generate comments. However, these methods do not model the interaction between videos and comments explicitly, so they tend to generate popular comments that are often irrelevant to the videos. In this work, we aim to improve the relevance between live comments and videos by modeling the cross-modal interactions among different modalities. To this end, we propose a multimodal matching transformer to capture the relationships among comments, vision, and audio. The proposed model is based on the transformer framework and can iteratively learn the attention-aware representations for each modality. We evaluate the model on a publicly available live commenting dataset. Experiments show that the multimodal matching transformer model outperforms the state-of-the-art methods.
Tasks	Text Generation
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02649v1
PDF	https://arxiv.org/pdf/2002.02649v1.pdf
PWC	https://paperswithcode.com/paper/multimodal-matching-transformer-for-live
Repo
Framework

Exploiting Database Management Systems and Treewidth for Counting


Title	Exploiting Database Management Systems and Treewidth for Counting
Authors	Johannes K. Fichte, Markus Hecher, Patrick Thier, Stefan Woltran
Abstract	Bounded treewidth is one of the most cited combinatorial invariants, which was applied in the literature for solving several counting problems efficiently. A canonical counting problem is #SAT, which asks to count the satisfying assignments of a Boolean formula. Recent work shows that benchmarking instances for #SAT often have reasonably small treewidth. This paper deals with counting problems for instances of small treewidth. We introduce a general framework to solve counting questions based on state-of-the-art database management systems (DBMS). Our framework takes explicitly advantage of small treewidth by solving instances using dynamic programming (DP) on tree decompositions (TD). Therefore, we implement the concept of DP into a DBMS (PostgreSQL), since DP algorithms are already often given in terms of table manipulations in theory. This allows for elegant specifications of DP algorithms and the use of SQL to manipulate records and tables, which gives us a natural approach to bring DP algorithms into practice. To the best of our knowledge, we present the first approach to employ a DBMS for algorithms on TDs. A key advantage of our approach is that DBMS naturally allow to deal with huge tables with a limited amount of main memory (RAM), parallelization, as well as suspending computation.
Tasks
Published	2020-01-13
URL	https://arxiv.org/abs/2001.04191v1
PDF	https://arxiv.org/pdf/2001.04191v1.pdf
PWC	https://paperswithcode.com/paper/exploiting-database-management-systems-and
Repo
Framework

Learning Contact-Rich Manipulation Tasks with Rigid Position-Controlled Robots: Learning to Force Control


Title	Learning Contact-Rich Manipulation Tasks with Rigid Position-Controlled Robots: Learning to Force Control
Authors	Cristian Camilo Beltran-Hernandez, Damien Petit, Ixchel G. Ramirez-Alpizar, Takayuki Nishi, Shinichi Kikuchi, Takamitsu Matsubara, Kensuke Harada
Abstract	To fully realize industrial automation, it is indispensable to give the robot manipulators the ability to adapt by themselves to their surroundings and to learn to handle novel manipulation tasks. Reinforcement Learning (RL) methods have been proven successful in solving manipulation tasks autonomously. However, RL is still not widely adopted on real robotic systems because working with real hardware entails additional challenges, especially when using rigid position-controlled manipulators. These challenges include the need for a robust controller to avoid undesired behavior, that risk damaging the robot and its environment, and constant supervision from a human operator. The main contributions of this work are, first, we propose a framework for safely training an RL agent on manipulation tasks using a rigid robot. Second, to enable a position-controlled manipulator to perform contact-rich manipulation tasks, we implemented two different force control schemes based on standard force feedback controllers; one is a modified hybrid position-force control, and the other one is an impedance control. Third, we empirically study both control schemes when used as the action representation of an RL agent. We evaluate the trade-off between control complexity and performance by comparing several versions of the control schemes, each with a different number of force control parameters. The proposed methods are validated both on simulation and a real robot, a UR3 e-series robotic arm when executing contact-rich manipulation tasks.
Tasks
Published	2020-03-02
URL	https://arxiv.org/abs/2003.00628v1
PDF	https://arxiv.org/pdf/2003.00628v1.pdf
PWC	https://paperswithcode.com/paper/learning-contact-rich-manipulation-tasks-with
Repo
Framework

Citation Text Generation


Title	Citation Text Generation
Authors	Kelvin Luu, Rik Koncel-Kedziorski, Kyle Lo, Isabel Cachola, Noah A. Smith
Abstract	We introduce the task of citation text generation: given a pair of scientific documents, explain their relationship in natural language text in the manner of a citation from one text to the other. This task encourages systems to learn rich relationships between scientific texts and to express them concretely in natural language. Models for citation text generation will require robust document understanding including the capacity to quickly adapt to new vocabulary and to reason about document content. We believe this challenging direction of research will benefit high-impact applications such as automatic literature review or scientific writing assistance systems. In this paper we establish the task of citation text generation with a standard evaluation corpus and explore several baseline models.
Tasks	Text Generation
Published	2020-02-02
URL	https://arxiv.org/abs/2002.00317v1
PDF	https://arxiv.org/pdf/2002.00317v1.pdf
PWC	https://paperswithcode.com/paper/citation-text-generation
Repo
Framework

Ten Research Challenge Areas in Data Science


Title	Ten Research Challenge Areas in Data Science
Authors	Jeannette M. Wing
Abstract	Although data science builds on knowledge from computer science, mathematics, statistics, and other disciplines, data science is a unique field with many mysteries to unlock: challenging scientific questions and pressing questions of societal importance. This article starts with meta-questions about data science as a discipline and then elaborates on ten ideas for the basis of a research agenda for data science.
Tasks
Published	2020-01-27
URL	https://arxiv.org/abs/2002.05658v1
PDF	https://arxiv.org/pdf/2002.05658v1.pdf
PWC	https://paperswithcode.com/paper/ten-research-challenge-areas-in-data-science
Repo
Framework


Title	GACEM: Generalized Autoregressive Cross Entropy Method for Multi-Modal Black Box Constraint Satisfaction
Authors	Kourosh Hakhamaneshi, Keertana Settaluri, Pieter Abbeel, Vladimir Stojanovic
Abstract	In this work we present a new method of black-box optimization and constraint satisfaction. Existing algorithms that have attempted to solve this problem are unable to consider multiple modes, and are not able to adapt to changes in environment dynamics. To address these issues, we developed a modified Cross-Entropy Method (CEM) that uses a masked auto-regressive neural network for modeling uniform distributions over the solution space. We train the model using maximum entropy policy gradient methods from Reinforcement Learning. Our algorithm is able to express complicated solution spaces, thus allowing it to track a variety of different solution regions. We empirically compare our algorithm with variations of CEM, including one with a Gaussian prior with fixed variance, and demonstrate better performance in terms of: number of diverse solutions, better mode discovery in multi-modal problems, and better sample efficiency in certain cases.
Tasks	Policy Gradient Methods
Published	2020-02-17
URL	https://arxiv.org/abs/2002.07236v1
PDF	https://arxiv.org/pdf/2002.07236v1.pdf
PWC	https://paperswithcode.com/paper/gacem-generalized-autoregressive-cross
Repo
Framework

Diversity, Density, and Homogeneity: Quantitative Characteristic Metrics for Text Collections


Title	Diversity, Density, and Homogeneity: Quantitative Characteristic Metrics for Text Collections
Authors	Yi-An Lai, Xuan Zhu, Yi Zhang, Mona Diab
Abstract	Summarizing data samples by quantitative measures has a long history, with descriptive statistics being a case in point. However, as natural language processing methods flourish, there are still insufficient characteristic metrics to describe a collection of texts in terms of the words, sentences, or paragraphs they comprise. In this work, we propose metrics of diversity, density, and homogeneity that quantitatively measure the dispersion, sparsity, and uniformity of a text collection. We conduct a series of simulations to verify that each metric holds desired properties and resonates with human intuitions. Experiments on real-world datasets demonstrate that the proposed characteristic metrics are highly correlated with text classification performance of a renowned model, BERT, which could inspire future applications.
Tasks	Text Classification
Published	2020-03-19
URL	https://arxiv.org/abs/2003.08529v1
PDF	https://arxiv.org/pdf/2003.08529v1.pdf
PWC	https://paperswithcode.com/paper/diversity-density-and-homogeneity
Repo
Framework

Anchor & Transform: Learning Sparse Representations of Discrete Objects


Title	Anchor & Transform: Learning Sparse Representations of Discrete Objects
Authors	Paul Pu Liang, Manzil Zaheer, Yuan Wang, Amr Ahmed
Abstract	Learning continuous representations of discrete objects such as text, users, and URLs lies at the heart of many applications including language and user modeling. When using discrete objects as input to neural networks, we often ignore the underlying structures (e.g. natural groupings and similarities) and embed the objects independently into individual vectors. As a result, existing methods do not scale to large vocabulary sizes. In this paper, we design a Bayesian nonparametric prior for embeddings that encourages sparsity and leverages natural groupings among objects. We derive an approximate inference algorithm based on Small Variance Asymptotics which yields a simple and natural algorithm for learning a small set of anchor embeddings and a sparse transformation matrix. We call our method Anchor & Transform (ANT) as the embeddings of discrete objects are a sparse linear combination of the anchors, weighted according to the transformation matrix. ANT is scalable, flexible, end-to-end trainable, and allows the user to incorporate domain knowledge about object relationships. On text classification and language modeling benchmarks, ANT demonstrates stronger performance with fewer parameters as compared to existing compression baselines.
Tasks	Language Modelling, Text Classification
Published	2020-03-18
URL	https://arxiv.org/abs/2003.08197v1
PDF	https://arxiv.org/pdf/2003.08197v1.pdf
PWC	https://paperswithcode.com/paper/anchor-transform-learning-sparse-1
Repo
Framework

Gradient $\ell_1$ Regularization for Quantization Robustness


Title	Gradient $\ell_1$ Regularization for Quantization Robustness
Authors	Milad Alizadeh, Arash Behboodi, Mart van Baalen, Christos Louizos, Tijmen Blankevoort, Max Welling
Abstract	We analyze the effect of quantizing weights and activations of neural networks on their loss and derive a simple regularization scheme that improves robustness against post-training quantization. By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths as energy and memory requirements of the application change. Unlike quantization-aware training using the straight-through estimator that only targets a specific bit-width and requires access to training data and pipeline, our regularization-based method paves the way for “on the fly’’ post-training quantization to various bit-widths. We show that by modeling quantization as a $\ell_\infty$-bounded perturbation, the first-order term in the loss expansion can be regularized using the $\ell_1$-norm of gradients. We experimentally validate the effectiveness of our regularization scheme on different architectures on CIFAR-10 and ImageNet datasets.
Tasks	Quantization
Published	2020-02-18
URL	https://arxiv.org/abs/2002.07520v1
PDF	https://arxiv.org/pdf/2002.07520v1.pdf
PWC	https://paperswithcode.com/paper/gradient-ell_1-regularization-for-1
Repo
Framework

Sub-Gaussian Matrices on Sets: Optimal Tail Dependence and Applications


Title	Sub-Gaussian Matrices on Sets: Optimal Tail Dependence and Applications
Authors	Halyun Jeong, Xiaowei Li, Yaniv Plan, Özgür Yılmaz
Abstract	Random linear mappings are widely used in modern signal processing, compressed sensing and machine learning. These mappings may be used to embed the data into a significantly lower dimension while at the same time preserving useful information. This is done by approximately preserving the distances between data points, which are assumed to belong to $\mathbb{R}^n$. Thus, the performance of these mappings is usually captured by how close they are to an isometry on the data. Random Gaussian linear mappings have been the object of much study, while the sub-Gaussian settings is not yet fully understood. In the latter case, the performance depends on the sub-Gaussian norm of the rows. In many applications, e.g., compressed sensing, this norm may be large, or even growing with dimension, and thus it is important to characterize this dependence. We study when a sub-Gaussian matrix can become a near isometry on a set, show that previous best known dependence on the sub-Gaussian norm was sub-optimal, and present the optimal dependence. Our result not only answers a remaining question posed by Liaw, Mehrabian, Plan and Vershynin in 2017, but also generalizes their work. We also develop a new Bernstein type inequality for sub-exponential random variables, and a new Hanson-Wright inequality for quadratic forms of sub-Gaussian random variables, in both cases improving the bounds in the sub-Gaussian regime under moment constraints. Finally, we illustrate popular applications such as Johnson-Lindenstrauss embeddings, randomized sketches and blind demodulation, whose theoretical guarantees can be improved by our results in the sub-Gaussian case.
Tasks
Published	2020-01-28
URL	https://arxiv.org/abs/2001.10631v1
PDF	https://arxiv.org/pdf/2001.10631v1.pdf
PWC	https://paperswithcode.com/paper/sub-gaussian-matrices-on-sets-optimal-tail
Repo
Framework

Data-Free Adversarial Perturbations for Practical Black-Box Attack


Title	Data-Free Adversarial Perturbations for Practical Black-Box Attack
Authors	ZhaoXin Huan, Yulong Wang, Xiaolu Zhang, Lin Shang, Chilin Fu, Jun Zhou
Abstract	Neural networks are vulnerable to adversarial examples, which are malicious inputs crafted to fool pre-trained models. Adversarial examples often exhibit black-box attacking transferability, which allows that adversarial examples crafted for one model can fool another model. However, existing black-box attack methods require samples from the training data distribution to improve the transferability of adversarial examples across different models. Because of the data dependence, the fooling ability of adversarial perturbations is only applicable when training data are accessible. In this paper, we present a data-free method for crafting adversarial perturbations that can fool a target model without any knowledge about the training data distribution. In the practical setting of a black-box attack scenario where attackers do not have access to target models and training data, our method achieves high fooling rates on target models and outperforms other universal adversarial perturbation methods. Our method empirically shows that current deep learning models are still at risk even when the attackers do not have access to training data.
Tasks
Published	2020-03-03
URL	https://arxiv.org/abs/2003.01295v1
PDF	https://arxiv.org/pdf/2003.01295v1.pdf
PWC	https://paperswithcode.com/paper/data-free-adversarial-perturbations-for
Repo
Framework

Adaptation of a deep learning malignancy model from full-field digital mammography to digital breast tomosynthesis


Title	Adaptation of a deep learning malignancy model from full-field digital mammography to digital breast tomosynthesis
Authors	Sadanand Singh, Thomas Paul Matthews, Meet Shah, Brent Mombourquette, Trevor Tsue, Aaron Long, Ranya Almohsen, Stefano Pedemonte, Jason Su
Abstract	Mammography-based screening has helped reduce the breast cancer mortality rate, but has also been associated with potential harms due to low specificity, leading to unnecessary exams or procedures, and low sensitivity. Digital breast tomosynthesis (DBT) improves on conventional mammography by increasing both sensitivity and specificity and is becoming common in clinical settings. However, deep learning (DL) models have been developed mainly on conventional 2D full-field digital mammography (FFDM) or scanned film images. Due to a lack of large annotated DBT datasets, it is difficult to train a model on DBT from scratch. In this work, we present methods to generalize a model trained on FFDM images to DBT images. In particular, we use average histogram matching (HM) and DL fine-tuning methods to generalize a FFDM model to the 2D maximum intensity projection (MIP) of DBT images. In the proposed approach, the differences between the FFDM and DBT domains are reduced via HM and then the base model, which was trained on abundant FFDM images, is fine-tuned. When evaluating on image patches extracted around identified findings, we are able to achieve similar areas under the receiver operating characteristic curve (ROC AUC) of $\sim 0.9$ for FFDM and $\sim 0.85$ for MIP images, as compared to a ROC AUC of $\sim 0.75$ when tested directly on MIP images.
Tasks
Published	2020-01-23
URL	https://arxiv.org/abs/2001.08381v1
PDF	https://arxiv.org/pdf/2001.08381v1.pdf
PWC	https://paperswithcode.com/paper/adaptation-of-a-deep-learning-malignancy
Repo
Framework

Understanding and Improving Knowledge Distillation


Title	Understanding and Improving Knowledge Distillation
Authors	Jiaxi Tang, Rakesh Shivanna, Zhe Zhao, Dong Lin, Anima Singh, Ed H. Chi, Sagar Jain
Abstract	Knowledge distillation is a model-agnostic technique to improve model quality while having a fixed capacity budget. It is a commonly used technique for model compression, where a higher capacity teacher model with better quality is used to train a more compact student model with better inference efficiency. Through distillation, one hopes to benefit from student’s compactness, without sacrificing too much on model quality. Despite the large success of knowledge distillation, better understanding of how it benefits student model’s training dynamics remains under-explored. In this paper, we dissect the effects of knowledge distillation into three main factors: (1) benefits inherited from label smoothing, (2) example re-weighting based on teacher’s confidence on ground-truth, and (3) prior knowledge of optimal output (logit) layer geometry. Using extensive systematic analyses and empirical studies on synthetic and real-world datasets, we confirm that the aforementioned three factors play a major role in knowledge distillation. Furthermore, based on our findings, we propose a simple, yet effective technique to improve knowledge distillation empirically.
Tasks	Model Compression
Published	2020-02-10
URL	https://arxiv.org/abs/2002.03532v1
PDF	https://arxiv.org/pdf/2002.03532v1.pdf
PWC	https://paperswithcode.com/paper/understanding-and-improving-knowledge
Repo
Framework

The Data Science Fire Next Time: Innovative strategies for mentoring in data science


Title	The Data Science Fire Next Time: Innovative strategies for mentoring in data science
Authors	Latifa Jackson, Heriberto Acosta Maestre
Abstract	As data mining research and applications continue to expand in to a variety of fields such as medicine, finance, security, etc., the need for talented and diverse individuals is clearly felt. This is particularly the case as Big Data initiatives have taken off in the federal, private and academic sectors, providing a wealth of opportunities, nationally and internationally. The Broadening Participation in Data Mining (BPDM) workshop was created more than 7 years ago with the goal of fostering mentorship, guidance, and connections for minority and underrepresented groups in the data science and machine learning community, while also enriching technical aptitude and exposure for a group of talented students. To date it has impacted the lives of more than 330 underrepresented trainees in data science. We provide a venue to connect talented students with innovative researchers in industry, academia, professional societies, and government. Our mission is to facilitate meaningful, lasting relationships between BPDM participants to ultimately increase diversity in data mining. This most recent workshop took place at Howard University in Washington, DC in February 2019. Here we report on the mentoring strategies that we undertook at the 2019 BPDM and how those were received.
Tasks
Published	2020-03-01
URL	https://arxiv.org/abs/2003.07681v1
PDF	https://arxiv.org/pdf/2003.07681v1.pdf
PWC	https://paperswithcode.com/paper/the-data-science-fire-next-time-innovative
Repo
Framework