April 3, 2020

3137 words 15 mins read

Paper Group AWR 53

Selective Weak Supervision for Neural Information Retrieval. Neural Networks in Evolutionary Dynamic Constrained Optimization: Computational Cost and Benefits. Adaptive Offline Quintuplet Loss for Image-Text Matching. Tree-SNE: Hierarchical Clustering and Visualization Using t-SNE. Exponential Step Sizes for Non-Convex Optimization. Tree++: Truncat …

Selective Weak Supervision for Neural Information Retrieval


Title	Selective Weak Supervision for Neural Information Retrieval
Authors	Kaitao Zhang, Chenyan Xiong, Zhenghao Liu, Zhiyuan Liu
Abstract	This paper democratizes neural information retrieval to scenarios where large scale relevance training signals are not available. We revisit the classic IR intuition that anchor-document relations approximate query-document relevance and propose a reinforcement weak supervision selection method, ReInfoSelect, which learns to select anchor-document pairs that best weakly supervise the neural ranker (action), using the ranking performance on a handful of relevance labels as the reward. Iteratively, for a batch of anchor-document pairs, ReInfoSelect back propagates the gradients through the neural ranker, gathers its NDCG reward, and optimizes the data selection network using policy gradients, until the neural ranker’s performance peaks on target relevance metrics (convergence). In our experiments on three TREC benchmarks, neural rankers trained by ReInfoSelect, with only publicly available anchor data, significantly outperform feature-based learning to rank methods and match the effectiveness of neural rankers trained with private commercial search logs. Our analyses show that ReInfoSelect effectively selects weak supervision signals based on the stage of the neural ranker training, and intuitively picks anchor-document pairs similar to query-document pairs.
Tasks	Information Retrieval, Learning-To-Rank
Published	2020-01-28
URL	https://arxiv.org/abs/2001.10382v1
PDF	https://arxiv.org/pdf/2001.10382v1.pdf
PWC	https://paperswithcode.com/paper/selective-weak-supervision-for-neural
Repo	https://github.com/thunlp/ReInfoSelect
Framework	pytorch

Neural Networks in Evolutionary Dynamic Constrained Optimization: Computational Cost and Benefits


Title	Neural Networks in Evolutionary Dynamic Constrained Optimization: Computational Cost and Benefits
Authors	Maryam Hasani-Shoreh, Renato Hermoza Aragonés, Frank Neumann
Abstract	Neural networks (NN) have been recently applied together with evolutionary algorithms (EAs) to solve dynamic optimization problems. The applied NN estimates the position of the next optimum based on the previous time best solutions. After detecting a change, the predicted solution can be employed to move the EA’s population to a promising region of the solution space in order to accelerate convergence and improve accuracy in tracking the optimum. While previous works show improvement of the results, they neglect the overhead created by NN. In this work, we reflect the time spent on training NN in the optimization time and compare the results with a baseline EA. We explore if by considering the generated overhead, NN is still able to improve the results, and under which condition is able to do so. The main difficulties to train the NN are: 1) to get enough samples to generalize predictions for new data, and 2) to obtain reliable samples. As NN needs to collect data at each time step, if the time horizon is short, we will not be able to collect enough samples to train the NN. To alleviate this, we propose to consider more individuals on each change to speed up sample collection in shorter time steps. In environments with a high frequency of changes, the solutions produced by EA are likely to be far from the real optimum. Using unreliable train data for the NN will, in consequence, produce unreliable predictions. Also, as the time spent for NN stays fixed regardless of the frequency, a higher frequency of change will mean a higher produced overhead by the NN in proportion to the EA. In general, after considering the generated overhead, we conclude that NN is not suitable in environments with a high frequency of changes and/or short time horizons. However, it can be promising for the low frequency of changes, and especially for the environments that changes have a pattern.
Tasks
Published	2020-01-22
URL	https://arxiv.org/abs/2001.11588v1
PDF	https://arxiv.org/pdf/2001.11588v1.pdf
PWC	https://paperswithcode.com/paper/neural-networks-in-evolutionary-dynamic
Repo	https://github.com/renato145/DENN
Framework	none

Adaptive Offline Quintuplet Loss for Image-Text Matching


Title	Adaptive Offline Quintuplet Loss for Image-Text Matching
Authors	Tianlang Chen, Jiajun Deng, Jiebo Luo
Abstract	Existing image-text matching approaches typically leverage triplet loss with online hard negatives to train the model. For each image or text anchor in a training mini-batch, the model is trained to distinguish between a positive and the most confusing negative of the anchor mined from the mini-batch (i.e. online hard negative). This strategy improves the model’s capacity to discover fine-grained correspondences and non-correspondences between image and text inputs. However, the above training approach has the following drawbacks: (1) the negative selection strategy still provides limited chances for the model to learn from very hard-to-distinguish cases. (2) The trained model has weak generalization capability from the training set to the testing set. (3) The penalty lacks hierarchy and adaptiveness for hard negatives with different `hardness'' degrees. In this paper, we propose solutions by sampling negatives offline from the whole training set. It provides` harder’’ offline negatives than online hard negatives for the model to distinguish. Based on the offline hard negatives, a quintuplet loss is proposed to improve the model’s generalization capability to distinguish positives and negatives. In addition, a novel loss function that combines the knowledge of positives, offline hard negatives and online hard negatives is created. It leverages offline hard negatives as intermediary to adaptively penalize them based on their distance relations to the anchor. We evaluate the proposed training approach on three state-of-the-art image-text models on the MS-COCO and Flickr30K datasets. Significant performance improvements are observed for all the models, demonstrating the effectiveness and generality of the proposed approach.
Tasks	Text Matching
Published	2020-03-07
URL	https://arxiv.org/abs/2003.03669v2
PDF	https://arxiv.org/pdf/2003.03669v2.pdf
PWC	https://paperswithcode.com/paper/adaptive-offline-quintuplet-loss-for-image
Repo	https://github.com/sunnychencool/AOQ
Framework	pytorch

Tree-SNE: Hierarchical Clustering and Visualization Using t-SNE


Title	Tree-SNE: Hierarchical Clustering and Visualization Using t-SNE
Authors	Isaac Robinson, Emma Pierce-Hoffman
Abstract	t-SNE and hierarchical clustering are popular methods of exploratory data analysis, particularly in biology. Building on recent advances in speeding up t-SNE and obtaining finer-grained structure, we combine the two to create tree-SNE, a hierarchical clustering and visualization algorithm based on stacked one-dimensional t-SNE embeddings. We also introduce alpha-clustering, which recommends the optimal cluster assignment, without foreknowledge of the number of clusters, based off of the cluster stability across multiple scales. We demonstrate the effectiveness of tree-SNE and alpha-clustering on images of handwritten digits, mass cytometry (CyTOF) data from blood cells, and single-cell RNA-sequencing (scRNA-seq) data from retinal cells. Furthermore, to demonstrate the validity of the visualization, we use alpha-clustering to obtain unsupervised clustering results competitive with the state of the art on several image data sets. Software is available at https://github.com/isaacrob/treesne.
Tasks
Published	2020-02-13
URL	https://arxiv.org/abs/2002.05687v1
PDF	https://arxiv.org/pdf/2002.05687v1.pdf
PWC	https://paperswithcode.com/paper/tree-sne-hierarchical-clustering-and
Repo	https://github.com/isaacrob/treesne
Framework	none

Exponential Step Sizes for Non-Convex Optimization


Title	Exponential Step Sizes for Non-Convex Optimization
Authors	Xiaoyu Li, Zhenxun Zhuang, Francesco Orabona
Abstract	Stochastic Gradient Descent (SGD) is a popular tool in large scale optimization of machine learning objective functions. However, the performance is greatly variable, depending on the choice of the step sizes. In this paper, we introduce the exponential step sizes for stochastic optimization of smooth non-convex functions which satisfy the Polyak-\L{}ojasiewicz (PL) condition. We show that, without any information on the level of noise over the stochastic gradients, these step sizes guarantee a convergence rate for the last iterate that automatically interpolates between a linear rate (in the noisy-free case) and a $O(\frac{1}{T})$ rate (in the noisy case), up to poly-logarithmic factors. Moreover, if without the PL condition, the exponential step sizes still guarantee optimal convergence to a critical point, up to logarithmic factors. We also validate our theoretical results with empirical experiments on real-world datasets with deep learning architectures.
Tasks	Stochastic Optimization
Published	2020-02-12
URL	https://arxiv.org/abs/2002.05273v1
PDF	https://arxiv.org/pdf/2002.05273v1.pdf
PWC	https://paperswithcode.com/paper/exponential-step-sizes-for-non-convex
Repo	https://github.com/zhenxun-zhuang/SGD-Exponential-Stepsize
Framework	pytorch

Tree++: Truncated Tree Based Graph Kernels


Title	Tree++: Truncated Tree Based Graph Kernels
Authors	Wei Ye, Zhen Wang, Rachel Redberg, Ambuj Singh
Abstract	Graph-structured data arise ubiquitously in many application domains. A fundamental problem is to quantify their similarities. Graph kernels are often used for this purpose, which decompose graphs into substructures and compare these substructures. However, most of the existing graph kernels do not have the property of scale-adaptivity, i.e., they cannot compare graphs at multiple levels of granularities. Many real-world graphs such as molecules exhibit structure at varying levels of granularities. To tackle this problem, we propose a new graph kernel called Tree++ in this paper. At the heart of Tree++ is a graph kernel called the path-pattern graph kernel. The path-pattern graph kernel first builds a truncated BFS tree rooted at each vertex and then uses paths from the root to every vertex in the truncated BFS tree as features to represent graphs. The path-pattern graph kernel can only capture graph similarity at fine granularities. In order to capture graph similarity at coarse granularities, we incorporate a new concept called super path into it. The super path contains truncated BFS trees rooted at the vertices in a path. Our evaluation on a variety of real-world graphs demonstrates that Tree++ achieves the best classification accuracy compared with previous graph kernels.
Tasks	Graph Similarity
Published	2020-02-23
URL	https://arxiv.org/abs/2002.09846v1
PDF	https://arxiv.org/pdf/2002.09846v1.pdf
PWC	https://paperswithcode.com/paper/tree-truncated-tree-based-graph-kernels
Repo	https://github.com/yeweiysh/TreePlusPlus
Framework	none

Gradually Vanishing Bridge for Adversarial Domain Adaptation


Title	Gradually Vanishing Bridge for Adversarial Domain Adaptation
Authors	Shuhao Cui, Shuhui Wang, Junbao Zhuo, Chi Su, Qingming Huang, Qi Tian
Abstract	In unsupervised domain adaptation, rich domain-specific characteristics bring great challenge to learn domain-invariant representations. However, domain discrepancy is considered to be directly minimized in existing solutions, which is difficult to achieve in practice. Some methods alleviate the difficulty by explicitly modeling domain-invariant and domain-specific parts in the representations, but the adverse influence of the explicit construction lies in the residual domain-specific characteristics in the constructed domain-invariant representations. In this paper, we equip adversarial domain adaptation with Gradually Vanishing Bridge (GVB) mechanism on both generator and discriminator. On the generator, GVB could not only reduce the overall transfer difficulty, but also reduce the influence of the residual domain-specific characteristics in domain-invariant representations. On the discriminator, GVB contributes to enhance the discriminating ability, and balance the adversarial training process. Experiments on three challenging datasets show that our GVB methods outperform strong competitors, and cooperate well with other adversarial methods. The code is available at https://github.com/cuishuhao/GVB.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13183v1
PDF	https://arxiv.org/pdf/2003.13183v1.pdf
PWC	https://paperswithcode.com/paper/gradually-vanishing-bridge-for-adversarial
Repo	https://github.com/cuishuhao/GVB
Framework	pytorch

Estimating Gradients for Discrete Random Variables by Sampling without Replacement


Title	Estimating Gradients for Discrete Random Variables by Sampling without Replacement
Authors	Wouter Kool, Herke van Hoof, Max Welling
Abstract	We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement, which reduces variance as it avoids duplicate samples. We show that our estimator can be derived as the Rao-Blackwellization of three different estimators. Combining our estimator with REINFORCE, we obtain a policy gradient estimator and we reduce its variance using a built-in control variate which is obtained without additional model evaluations. The resulting estimator is closely related to other gradient estimators. Experiments with a toy problem, a categorical Variational Auto-Encoder and a structured prediction problem show that our estimator is the only estimator that is consistently among the best estimators in both high and low entropy settings.
Tasks	Structured Prediction
Published	2020-02-14
URL	https://arxiv.org/abs/2002.06043v1
PDF	https://arxiv.org/pdf/2002.06043v1.pdf
PWC	https://paperswithcode.com/paper/estimating-gradients-for-discrete-random-1
Repo	https://github.com/wouterkool/estimating-gradients-without-replacement
Framework	pytorch

LP-SparseMAP: Differentiable Relaxed Optimization for Sparse Structured Prediction


Title	LP-SparseMAP: Differentiable Relaxed Optimization for Sparse Structured Prediction
Authors	Vlad Niculae, André F. T. Martins
Abstract	Structured prediction requires manipulating a large number of combinatorial structures, e.g., dependency trees or alignments, either as latent or output variables. Recently, the SparseMAP method has been proposed as a differentiable, sparse alternative to maximum a posteriori (MAP) and marginal inference. SparseMAP returns a combination of a small number of structures, a desirable property in some downstream applications. However, SparseMAP requires a tractable MAP inference oracle. This excludes, e.g., loopy graphical models or factor graphs with logic constraints, which generally require approximate inference. In this paper, we introduce LP-SparseMAP, an extension of SparseMAP that addresses this limitation via a local polytope relaxation. LP-SparseMAP uses the flexible and powerful domain specific language of factor graphs for defining and backpropagating through arbitrary hidden structure, supporting coarse decompositions, hard logic constraints, and higher-order correlations. We derive the forward and backward algorithms needed for using LP-SparseMAP as a hidden or output layer. Experiments in three structured prediction tasks show benefits compared to SparseMAP and Structured SVM.
Tasks	Structured Prediction
Published	2020-01-13
URL	https://arxiv.org/abs/2001.04437v1
PDF	https://arxiv.org/pdf/2001.04437v1.pdf
PWC	https://paperswithcode.com/paper/lp-sparsemap-differentiable-relaxed
Repo	https://github.com/deep-spin/lp-sparsemap
Framework	none

Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD


Title	Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD
Authors	Jianyu Wang, Hao Liang, Gauri Joshi
Abstract	Distributed stochastic gradient descent (SGD) is essential for scaling the machine learning algorithms to a large number of computing nodes. However, the infrastructures variability such as high communication delay or random node slowdown greatly impedes the performance of distributed SGD algorithm, especially in a wireless system or sensor networks. In this paper, we propose an algorithmic approach named Overlap-Local-SGD (and its momentum variant) to overlap the communication and computation so as to speedup the distributed training procedure. The approach can help to mitigate the straggler effects as well. We achieve this by adding an anchor model on each node. After multiple local updates, locally trained models will be pulled back towards the synchronized anchor model rather than communicating with others. Experimental results of training a deep neural network on CIFAR-10 dataset demonstrate the effectiveness of Overlap-Local-SGD. We also provide a convergence guarantee for the proposed algorithm under non-convex objective functions.
Tasks
Published	2020-02-21
URL	https://arxiv.org/abs/2002.09539v1
PDF	https://arxiv.org/pdf/2002.09539v1.pdf
PWC	https://paperswithcode.com/paper/overlap-local-sgd-an-algorithmic-approach-to
Repo	https://github.com/JYWa/Overlap_Local_SGD
Framework	pytorch

AutoMATES: Automated Model Assembly from Text, Equations, and Software


Title	AutoMATES: Automated Model Assembly from Text, Equations, and Software
Authors	Adarsh Pyarelal, Marco A. Valenzuela-Escarcega, Rebecca Sharp, Paul D. Hein, Jon Stephens, Pratik Bhandari, HeuiChan Lim, Saumya Debray, Clayton T. Morrison
Abstract	Models of complicated systems can be represented in different ways - in scientific papers, they are represented using natural language text as well as equations. But to be of real use, they must also be implemented as software, thus making code a third form of representing models. We introduce the AutoMATES project, which aims to build semantically-rich unified representations of models from scientific code and publications to facilitate the integration of computational models from different domains and allow for modeling large, complicated systems that span multiple domains and levels of abstraction.
Tasks
Published	2020-01-21
URL	https://arxiv.org/abs/2001.07295v1
PDF	https://arxiv.org/pdf/2001.07295v1.pdf
PWC	https://paperswithcode.com/paper/automates-automated-model-assembly-from-text
Repo	https://github.com/ml4ai/automates
Framework	none

Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation


Title	Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation
Authors	Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Liujuan Cao, Chenglin Wu, Cheng Deng, Rongrong Ji
Abstract	Referring expression comprehension (REC) and segmentation (RES) are two highly-related tasks, which both aim at identifying the referent according to a natural language expression. In this paper, we propose a novel Multi-task Collaborative Network (MCN) to achieve a joint learning of REC and RES for the first time. In MCN, RES can help REC to achieve better language-vision alignment, while REC can help RES to better locate the referent. In addition, we address a key challenge in this multi-task setup, i.e., the prediction conflict, with two innovative designs namely, Consistency Energy Maximization (CEM) and Adaptive Soft Non-Located Suppression (ASNLS). Specifically, CEM enables REC and RES to focus on similar visual regions by maximizing the consistency energy between two tasks. ASNLS supresses the response of unrelated regions in RES based on the prediction of REC. To validate our model, we conduct extensive experiments on three benchmark datasets of REC and RES, i.e., RefCOCO, RefCOCO+ and RefCOCOg. The experimental results report the significant performance gains of MCN over all existing methods, i.e., up to +7.13% for REC and +11.50% for RES over SOTA, which well confirm the validity of our model for joint REC and RES learning.
Tasks
Published	2020-03-19
URL	https://arxiv.org/abs/2003.08813v1
PDF	https://arxiv.org/pdf/2003.08813v1.pdf
PWC	https://paperswithcode.com/paper/multi-task-collaborative-network-for-joint
Repo	https://github.com/luogen1996/MCN
Framework	tf

Stanza: A Python Natural Language Processing Toolkit for Many Human Languages


Title	Stanza: A Python Natural Language Processing Toolkit for Many Human Languages
Authors	Peng Qi, Yuhao Zhang, Yuhui Zhang, Jason Bolton, Christopher D. Manning
Abstract	We introduce Stanza, an open-source Python natural language processing toolkit supporting 66 human languages. Compared to existing widely used toolkits, Stanza features a language-agnostic fully neural pipeline for text analysis, including tokenization, multi-word token expansion, lemmatization, part-of-speech and morphological feature tagging, dependency parsing, and named entity recognition. We have trained Stanza on a total of 112 datasets, including the Universal Dependencies treebanks and other multilingual corpora, and show that the same neural architecture generalizes well and achieves competitive performance on all languages tested. Additionally, Stanza includes a native Python interface to the widely used Java Stanford CoreNLP software, which further extends its functionalities to cover other tasks such as coreference resolution and relation extraction. Source code, documentation, and pretrained models for 66 languages are available at https://stanfordnlp.github.io/stanza.
Tasks	Coreference Resolution, Dependency Parsing, Lemmatization, Named Entity Recognition, Relation Extraction, Tokenization
Published	2020-03-16
URL	https://arxiv.org/abs/2003.07082v1
PDF	https://arxiv.org/pdf/2003.07082v1.pdf
PWC	https://paperswithcode.com/paper/stanza-a-python-natural-language-processing
Repo	https://github.com/stanfordnlp/stanza
Framework	pytorch

Teaching Software Engineering for AI-Enabled Systems


Title	Teaching Software Engineering for AI-Enabled Systems
Authors	Christian Kästner, Eunsuk Kang
Abstract	Software engineers have significant expertise to offer when building intelligent systems, drawing on decades of experience and methods for building systems that are scalable, responsive and robust, even when built on unreliable components. Systems with artificial-intelligence or machine-learning (ML) components raise new challenges and require careful engineering. We designed a new course to teach software-engineering skills to students with a background in ML. We specifically go beyond traditional ML courses that teach modeling techniques under artificial conditions and focus, in lecture and assignments, on realism with large and changing datasets, robust and evolvable infrastructure, and purposeful requirements engineering that considers ethics and fairness as well. We describe the course and our infrastructure and share experience and all material from teaching the course for the first time.
Tasks
Published	2020-01-18
URL	https://arxiv.org/abs/2001.06691v1
PDF	https://arxiv.org/pdf/2001.06691v1.pdf
PWC	https://paperswithcode.com/paper/teaching-software-engineering-for-ai-enabled
Repo	https://github.com/ckaestne/seai
Framework	none

Molecule Attention Transformer


Title	Molecule Attention Transformer
Authors	Łukasz Maziarka, Tomasz Danel, Sławomir Mucha, Krzysztof Rataj, Jacek Tabor, Stanisław Jastrzębski
Abstract	Designing a single neural network architecture that performs competitively across a range of molecule property prediction tasks remains largely an open challenge, and its solution may unlock a widespread use of deep learning in the drug discovery industry. To move towards this goal, we propose Molecule Attention Transformer (MAT). Our key innovation is to augment the attention mechanism in Transformer using inter-atomic distances and the molecular graph structure. Experiments show that MAT performs competitively on a diverse set of molecular prediction tasks. Most importantly, with a simple self-supervised pretraining, MAT requires tuning of only a few hyperparameter values to achieve state-of-the-art performance on downstream tasks. Finally, we show that attention weights learned by MAT are interpretable from the chemical point of view.
Tasks	Drug Discovery
Published	2020-02-19
URL	https://arxiv.org/abs/2002.08264v1
PDF	https://arxiv.org/pdf/2002.08264v1.pdf
PWC	https://paperswithcode.com/paper/molecule-attention-transformer
Repo	https://github.com/gmum/MAT
Framework	pytorch