April 3, 2020

3291 words 16 mins read

Paper Group AWR 5

Gryffin: An algorithm for Bayesian optimization for categorical variables informed by physical intuition with applications to chemistry. Training Binary Neural Networks with Real-to-Binary Convolutions. ForecastNet: A Time-Variant Deep Feed-Forward Neural Network Architecture for Multi-Step-Ahead Time-Series Forecasting. TACO: Trash Annotations in …

Gryffin: An algorithm for Bayesian optimization for categorical variables informed by physical intuition with applications to chemistry


Title	Gryffin: An algorithm for Bayesian optimization for categorical variables informed by physical intuition with applications to chemistry
Authors	Florian Häse, Loïc M. Roch, Alán Aspuru-Guzik
Abstract	Designing functional molecules and advanced materials requires complex interdependent design choices: tuning continuous process parameters such as temperatures or flow rates, while simultaneously selecting categorical variables like catalysts or solvents. To date, the development of data-driven experiment planning strategies for autonomous experimentation has largely focused on continuous process parameters despite the urge to devise efficient strategies for the selection of categorical variables to substantially accelerate scientific discovery. We introduce Gryffin, as a general purpose optimization framework for the autonomous selection of categorical variables driven by expert knowledge. Gryffin augments Bayesian optimization with kernel density estimation using smooth approximations to categorical distributions. Leveraging domain knowledge from physicochemical descriptors to characterize categorical options, Gryffin can significantly accelerate the search for promising molecules and materials. Gryffin can further highlight relevant correlations between the provided descriptors to inspire physical insights and foster scientific intuition. In addition to comprehensive benchmarks, we demonstrate the capabilities and performance of Gryffin on three examples in materials science and chemistry: (i) the discovery of non-fullerene acceptors for organic solar cells, (ii) the design of hybrid organic-inorganic perovskites for light-harvesting, and (iii) the identification of ligands and process parameters for Suzuki-Miyaura reactions. Our observations suggest that Gryffin, in its simplest form without descriptors, constitutes a competitive categorical optimizer compared to state-of-the-art approaches. However, when leveraging domain knowledge provided via descriptors, Gryffin can optimize at considerable higher rates and refine this domain knowledge to spark scientific understanding.
Tasks	Density Estimation
Published	2020-03-26
URL	https://arxiv.org/abs/2003.12127v1
PDF	https://arxiv.org/pdf/2003.12127v1.pdf
PWC	https://paperswithcode.com/paper/gryffin-an-algorithm-for-bayesian
Repo	https://github.com/aspuru-guzik-group/gryffin
Framework	none

Training Binary Neural Networks with Real-to-Binary Convolutions


Title	Training Binary Neural Networks with Real-to-Binary Convolutions
Authors	Brais Martinez, Jing Yang, Adrian Bulat, Georgios Tzimiropoulos
Abstract	This paper shows how to train binary networks to within a few percent points ($\sim 3-5 %$) of the full precision counterpart. We first show how to build a strong baseline, which already achieves state-of-the-art accuracy, by combining recently proposed advances and carefully adjusting the optimization procedure. Secondly, we show that by attempting to minimize the discrepancy between the output of the binary and the corresponding real-valued convolution, additional significant accuracy gains can be obtained. We materialize this idea in two complementary ways: (1) with a loss function, during training, by matching the spatial attention maps computed at the output of the binary and real-valued convolutions, and (2) in a data-driven manner, by using the real-valued activations, available during inference prior to the binarization process, for re-scaling the activations right after the binary convolution. Finally, we show that, when putting all of our improvements together, the proposed model beats the current state of the art by more than 5% top-1 accuracy on ImageNet and reduces the gap to its real-valued counterpart to less than 3% and 5% top-1 accuracy on CIFAR-100 and ImageNet respectively when using a ResNet-18 architecture. Code available at https://github.com/brais-martinez/real2binary.
Tasks
Published	2020-03-25
URL	https://arxiv.org/abs/2003.11535v1
PDF	https://arxiv.org/pdf/2003.11535v1.pdf
PWC	https://paperswithcode.com/paper/training-binary-neural-networks-with-real-to-1
Repo	https://github.com/brais-martinez/real2binary
Framework	none

ForecastNet: A Time-Variant Deep Feed-Forward Neural Network Architecture for Multi-Step-Ahead Time-Series Forecasting


Title	ForecastNet: A Time-Variant Deep Feed-Forward Neural Network Architecture for Multi-Step-Ahead Time-Series Forecasting
Authors	Joel Janek Dabrowski, YiFan Zhang, Ashfaqur Rahman
Abstract	Recurrent and convolutional neural networks are the most common architectures used for time series forecasting in deep learning literature. These networks use parameter sharing by repeating a set of fixed architectures with fixed parameters over time or space. The result is that the overall architecture is time-invariant (shift-invariant in the spatial domain) or stationary. We argue that time-invariance can reduce the capacity to perform multi-step-ahead forecasting, where modelling the dynamics at a range of scales and resolutions is required. We propose ForecastNet which uses a deep feed-forward architecture to provide a time-variant model. An additional novelty of ForecastNet is interleaved outputs, which we show assist in mitigating vanishing gradients. ForecastNet is demonstrated to outperform statistical and deep learning benchmark models on several datasets.
Tasks	Time Series, Time Series Forecasting
Published	2020-02-11
URL	https://arxiv.org/abs/2002.04155v1
PDF	https://arxiv.org/pdf/2002.04155v1.pdf
PWC	https://paperswithcode.com/paper/forecastnet-a-time-variant-deep-feed-forward
Repo	https://github.com/jjdabr/forecastNet
Framework	pytorch

TACO: Trash Annotations in Context for Litter Detection


Title	TACO: Trash Annotations in Context for Litter Detection
Authors	Pedro F Proença, Pedro Simões
Abstract	TACO is an open image dataset for litter detection and segmentation, which is growing through crowdsourcing. Firstly, this paper describes this dataset and the tools developed to support it. Secondly, we report instance segmentation performance using Mask R-CNN on the current version of TACO. Despite its small size (1500 images and 4784 annotations), our results are promising on this challenging problem. However, to achieve satisfactory trash detection in the wild for deployment, TACO still needs much more manual annotations. These can be contributed using: http://tacodataset.org/
Tasks	Instance Segmentation, Semantic Segmentation
Published	2020-03-16
URL	https://arxiv.org/abs/2003.06975v2
PDF	https://arxiv.org/pdf/2003.06975v2.pdf
PWC	https://paperswithcode.com/paper/taco-trash-annotations-in-context-for-litter
Repo	https://github.com/pedropro/TACO
Framework	none

Hybrid Deep Embedding for Recommendations with Dynamic Aspect-Level Explanations


Title	Hybrid Deep Embedding for Recommendations with Dynamic Aspect-Level Explanations
Authors	Huanrui Luo, Ning Yang, Philip S. Yu
Abstract	Explainable recommendation is far from being well solved partly due to three challenges. The first is the personalization of preference learning, which requires that different items/users have different contributions to the learning of user preference or item quality. The second one is dynamic explanation, which is crucial for the timeliness of recommendation explanations. The last one is the granularity of explanations. In practice, aspect-level explanations are more persuasive than item-level or user-level ones. In this paper, to address these challenges simultaneously, we propose a novel model called Hybrid Deep Embedding (HDE) for aspect-based explainable recommendations, which can make recommendations with dynamic aspect-level explanations. The main idea of HDE is to learn the dynamic embeddings of users and items for rating prediction and the dynamic latent aspect preference/quality vectors for the generation of aspect-level explanations, through fusion of the dynamic implicit feedbacks extracted from reviews and the attentive user-item interactions. Particularly, as the aspect preference/quality of users/items is learned automatically, HDE is able to capture the impact of aspects that are not mentioned in reviews of a user or an item. The extensive experiments conducted on real datasets verify the recommending performance and explainability of HDE. The source code of our work is available at \url{https://github.com/lola63/HDE-Python}
Tasks
Published	2020-01-18
URL	https://arxiv.org/abs/2001.10341v1
PDF	https://arxiv.org/pdf/2001.10341v1.pdf
PWC	https://paperswithcode.com/paper/hybrid-deep-embedding-for-recommendations
Repo	https://github.com/lola63/HDE-Python
Framework	none

Deep Graph Matching via Blackbox Differentiation of Combinatorial Solvers


Title	Deep Graph Matching via Blackbox Differentiation of Combinatorial Solvers
Authors	Michal Rolínek, Paul Swoboda, Dominik Zietlow, Anselm Paulus, Vít Musil, Georg Martius
Abstract	Building on recent progress at the intersection of combinatorial optimization and deep learning, we propose an end-to-end trainable architecture for deep graph matching that contains unmodified combinatorial solvers. Using the presence of heavily optimized combinatorial solvers together with some improvements in architecture design, we advance state-of-the-art on deep graph matching benchmarks for keypoint correspondence. In addition, we highlight the conceptual advantages of incorporating solvers into deep learning architectures, such as the possibility of post-processing with a strong multi-graph matching solver or the indifference to changes in the training setting. Finally, we propose two new challenging experimental setups.
Tasks	Combinatorial Optimization, Graph Matching
Published	2020-03-25
URL	https://arxiv.org/abs/2003.11657v1
PDF	https://arxiv.org/pdf/2003.11657v1.pdf
PWC	https://paperswithcode.com/paper/deep-graph-matching-via-blackbox
Repo	https://github.com/martius-lab/blackbox-backprop
Framework	pytorch

A Unified Object Motion and Affinity Model for Online Multi-Object Tracking


Title	A Unified Object Motion and Affinity Model for Online Multi-Object Tracking
Authors	Junbo Yin, Wenguan Wang, Qinghao Meng, Ruigang Yang, Jianbing Shen
Abstract	Current popular online multi-object tracking (MOT) solutions apply single object trackers (SOTs) to capture object motions, while often requiring an extra affinity network to associate objects, especially for the occluded ones. This brings extra computational overhead due to repetitive feature extraction for SOT and affinity computation. Meanwhile, the model size of the sophisticated affinity network is usually non-trivial. In this paper, we propose a novel MOT framework that unifies object motion and affinity model into a single network, named UMA, in order to learn a compact feature that is discriminative for both object motion and affinity measure. In particular, UMA integrates single object tracking and metric learning into a unified triplet network by means of multi-task learning. Such design brings advantages of improved computation efficiency, low memory requirement and simplified training procedure. In addition, we equip our model with a task-specific attention module, which is used to boost task-aware feature learning. The proposed UMA can be easily trained end-to-end, and is elegant - requiring only one training stage. Experimental results show that it achieves promising performance on several MOT Challenge benchmarks.
Tasks	Metric Learning, Multi-Object Tracking, Multi-Task Learning, Object Tracking, Online Multi-Object Tracking
Published	2020-03-25
URL	https://arxiv.org/abs/2003.11291v1
PDF	https://arxiv.org/pdf/2003.11291v1.pdf
PWC	https://paperswithcode.com/paper/a-unified-object-motion-and-affinity-model
Repo	https://github.com/yinjunbo/UMA-MOT
Framework	none

AutoFIS: Automatic Feature Interaction Selection in Factorization Models for Click-Through Rate Prediction


Title	AutoFIS: Automatic Feature Interaction Selection in Factorization Models for Click-Through Rate Prediction
Authors	Bin Liu, Chenxu Zhu, Guilin Li, Weinan Zhang, Jincai Lai, Ruiming Tang, Xiuqiang He, Zhenguo Li, Yong Yu
Abstract	Learning effective feature interactions is crucial for click-through rate (CTR) prediction tasks in recommender systems. In most of the existing deep learning models, feature interactions are either manually designed or simply enumerated. However, enumerating all feature interactions brings large memory and computation cost. Even worse, useless interactions may introduce unnecessary noise and complicate the training process. In this work, we propose a two-stage algorithm called Automatic Feature Interaction Selection (AutoFIS). AutoFIS can automatically identify all the important feature interactions for factorization models with just the computational cost equivalent to training the target model to convergence. In the \emph{search stage}, instead of searching over a discrete set of candidate feature interactions, we relax the choices to be continuous by introducing the architecture parameters. By implementing a regularized optimizer over the architecture parameters, the model can automatically identify and remove the redundant feature interactions during the training process of the model. In the \emph{re-train stage}, we keep the architecture parameters serving as an attention unit to further boost the performance. Offline experiments on three large-scale datasets (two public benchmarks, one private) demonstrate that the proposed AutoFIS can significantly improve various FM based models. AutoFIS has been deployed onto the training platform of Huawei App Store recommendation service, where a 10-day online A/B test demonstrated that AutoFIS improved the DeepFM model by 20.3% and 20.1% in terms of CTR and CVR respectively.
Tasks	Click-Through Rate Prediction, Recommendation Systems
Published	2020-03-25
URL	https://arxiv.org/abs/2003.11235v2
PDF	https://arxiv.org/pdf/2003.11235v2.pdf
PWC	https://paperswithcode.com/paper/autofis-automatic-feature-interaction
Repo	https://github.com/zhuchenxv/AutoFIS
Framework	none

Spatio-Temporal Handwriting Imitation


Title	Spatio-Temporal Handwriting Imitation
Authors	Martin Mayr, Martin Stumpf, Anguelos Nikolaou, Mathias Seuret, Andreas Maier, Vincent Christlein
Abstract	Most people think that their handwriting is unique and cannot be imitated by machines, especially not using completely new content. Current cursive handwriting synthesis is visually limited or needs user interaction. We show that subdividing the process into smaller subtasks makes it possible to imitate someone’s handwriting with a high chance to be visually indistinguishable for humans. Therefore, a given handwritten sample will be used as the target style. This sample is transferred to an online sequence. Then, a method for online handwriting synthesis is used to produce a new realistic-looking text primed with the online input sequence. This new text is then rendered and style-adapted to the input pen. We show the effectiveness of the pipeline by generating in- and out-of-vocabulary handwritten samples that are validated in a comprehensive user study. Additionally, we show that also a typical writer identification system can partially be fooled by the created fake handwritings.
Tasks
Published	2020-03-24
URL	https://arxiv.org/abs/2003.10593v1
PDF	https://arxiv.org/pdf/2003.10593v1.pdf
PWC	https://paperswithcode.com/paper/spatio-temporal-handwriting-imitation
Repo	https://github.com/M4rt1nM4yr/spatio-temporal_handwriting_imitation
Framework	pytorch

Motion-Attentive Transition for Zero-Shot Video Object Segmentation


Title	Motion-Attentive Transition for Zero-Shot Video Object Segmentation
Authors	Tianfei Zhou, Shunzhou Wang, Yi Zhou, Yazhou Yao, Jianwu Li, Ling Shao
Abstract	In this paper, we present a novel Motion-Attentive Transition Network (MATNet) for zero-shot video object segmentation, which provides a new way of leveraging motion information to reinforce spatio-temporal object representation. An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder, which transforms appearance features into motion-attentive representations at each convolutional stage. In this way, the encoder becomes deeply interleaved, allowing for closely hierarchical interactions between object motion and appearance. This is superior to the typical two-stream architecture, which treats motion and appearance separately in each stream and often suffers from overfitting to appearance information. Additionally, a bridge network is proposed to obtain a compact, discriminative and scale-sensitive representation for multi-level encoder features, which is further fed into a decoder to achieve segmentation results. Extensive experiments on three challenging public benchmarks (i.e. DAVIS-16, FBMS and Youtube-Objects) show that our model achieves compelling performance against the state-of-the-arts.
Tasks	Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2020-03-09
URL	https://arxiv.org/abs/2003.04253v2
PDF	https://arxiv.org/pdf/2003.04253v2.pdf
PWC	https://paperswithcode.com/paper/motion-attentive-transition-for-zero-shot
Repo	https://github.com/tfzhou/MATNet
Framework	pytorch

A Probabilistic Formulation of Unsupervised Text Style Transfer


Title	A Probabilistic Formulation of Unsupervised Text Style Transfer
Authors	Junxian He, Xinyi Wang, Graham Neubig, Taylor Berg-Kirkpatrick
Abstract	We present a deep generative model for unsupervised text style transfer that unifies previously proposed non-generative techniques. Our probabilistic approach models non-parallel data from two domains as a partially observed parallel corpus. By hypothesizing a parallel latent sequence that generates each observed sequence, our model learns to transform sequences from one domain to another in a completely unsupervised fashion. In contrast with traditional generative sequence models (e.g. the HMM), our model makes few assumptions about the data it generates: it uses a recurrent language model as a prior and an encoder-decoder as a transduction distribution. While computation of marginal data likelihood is intractable in this model class, we show that amortized variational inference admits a practical surrogate. Further, by drawing connections between our variational objective and other recent unsupervised style transfer and machine translation techniques, we show how our probabilistic view can unify some known non-generative objectives such as backtranslation and adversarial loss. Finally, we demonstrate the effectiveness of our method on a wide range of unsupervised style transfer tasks, including sentiment transfer, formality transfer, word decipherment, author imitation, and related language translation. Across all style transfer tasks, our approach yields substantial gains over state-of-the-art non-generative baselines, including the state-of-the-art unsupervised machine translation techniques that our approach generalizes. Further, we conduct experiments on a standard unsupervised machine translation task and find that our unified approach matches the current state-of-the-art.
Tasks	Language Modelling, Machine Translation, Style Transfer, Text Style Transfer, Unsupervised Machine Translation
Published	2020-02-10
URL	https://arxiv.org/abs/2002.03912v2
PDF	https://arxiv.org/pdf/2002.03912v2.pdf
PWC	https://paperswithcode.com/paper/a-probabilistic-formulation-of-unsupervised-1
Repo	https://github.com/cindyxinyiwang/deep-latent-sequence-model
Framework	pytorch

Learning@home: Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts


Title	Learning@home: Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts
Authors	Maksim Riabinin, Anton Gusev
Abstract	Many recent breakthroughs in deep learning were achieved by training increasingly larger models on massive datasets. However, training such models can be prohibitively expensive. For instance, Megatron Language Model with 8.3B parameters was trained on a GPU cluster worth $25 million. As a result, most researchers cannot afford to train state of the art models and contribute to their development. Hypothetically, a researcher could crowdsource the training of large neural networks with thousands of regular PCs provided by volunteers. The raw computing power of ten thousand $2500 desktops dwarfs that of a $25M server pod, but one cannot utilize that power efficiently with conventional distributed training methods. In this work, we propose Learning@home: a neural network training paradigm designed to handle millions of poorly connected participants. We analyze the performance, reliability, and architectural constraints of this paradigm and compare it against existing distributed training techniques.
Tasks	Language Modelling
Published	2020-02-10
URL	https://arxiv.org/abs/2002.04013v1
PDF	https://arxiv.org/pdf/2002.04013v1.pdf
PWC	https://paperswithcode.com/paper/learninghome-crowdsourced-training-of-large
Repo	https://github.com/mryab/learning-at-home
Framework	pytorch

Adapted Center and Scale Prediction: More Stable and More Accurate


Title	Adapted Center and Scale Prediction: More Stable and More Accurate
Authors	Wenhao Wang
Abstract	Pedestrian detection benefits from deep learning technology and gains rapid development in recent years. Most of detectors follow general object detection frame, i.e. default boxes and two-stage process. Recently, anchor-free and one-stage detectors have been introduced into this area. However, their accuracies are unsatisfactory. Therefore, in order to enjoy the simplicity of anchor-free detectors and the accuracy of two-stage ones simultaneously, we propose some adaptations based on a detector, Center and Scale Prediction(CSP). The main contributions of our paper are: (1) We improve the robustness of CSP and make it easier to train. (2) We propose a novel method to predict width, namely compressing width. (3) We achieve the second best performance on CityPersons benchmark, i.e. 9.3% log-average miss rate(MR) on reasonable set, 8.7% MR on partial set and 5.6% MR on bare set, which shows an anchor-free and one-stage detector can still have high accuracy. (4) We explore some capabilities of Switchable Normalization which are not mentioned in its original paper.
Tasks	Object Detection, Pedestrian Detection
Published	2020-02-20
URL	https://arxiv.org/abs/2002.09053v2
PDF	https://arxiv.org/pdf/2002.09053v2.pdf
PWC	https://paperswithcode.com/paper/adapted-center-and-scale-prediction-more
Repo	https://github.com/WangWenhao0716/Adapted-Center-and-Scale-Prediction
Framework	pytorch

Efficient Sentence Embedding via Semantic Subspace Analysis


Title	Efficient Sentence Embedding via Semantic Subspace Analysis
Authors	Bin Wang, Fenxiao Chen, Yuncheng Wang, C. -C. Jay Kuo
Abstract	A novel sentence embedding method built upon semantic subspace analysis, called semantic subspace sentence embedding (S3E), is proposed in this work. Given the fact that word embeddings can capture semantic relationship while semantically similar words tend to form semantic groups in a high-dimensional embedding space, we develop a sentence representation scheme by analyzing semantic subspaces of its constituent words. Specifically, we construct a sentence model from two aspects. First, we represent words that lie in the same semantic group using the intra-group descriptor. Second, we characterize the interaction between multiple semantic groups with the inter-group descriptor. The proposed S3E method is evaluated on both textual similarity tasks and supervised tasks. Experimental results show that it offers comparable or better performance than the state-of-the-art. The complexity of our S3E method is also much lower than other parameterized models.
Tasks	Sentence Embedding, Word Embeddings
Published	2020-02-22
URL	https://arxiv.org/abs/2002.09620v2
PDF	https://arxiv.org/pdf/2002.09620v2.pdf
PWC	https://paperswithcode.com/paper/efficient-sentence-embedding-via-semantic
Repo	https://github.com/BinWang28/Sentence-Embedding-S3E
Framework	none

R2DE: a NLP approach to estimating IRT parameters of newly generated questions


Title	R2DE: a NLP approach to estimating IRT parameters of newly generated questions
Authors	Luca Benedetto, Andrea Cappelli, Roberto Turrin, Paolo Cremonesi
Abstract	The main objective of exams consists in performing an assessment of students’ expertise on a specific subject. Such expertise, also referred to as skill or knowledge level, can then be leveraged in different ways (e.g., to assign a grade to the students, to understand whether a student might need some support, etc.). Similarly, the questions appearing in the exams have to be assessed in some way before being used to evaluate students. Standard approaches to questions’ assessment are either subjective (e.g., assessment by human experts) or introduce a long delay in the process of question generation (e.g., pretesting with real students). In this work we introduce R2DE (which is a Regressor for Difficulty and Discrimination Estimation), a model capable of assessing newly generated multiple-choice questions by looking at the text of the question and the text of the possible choices. In particular, it can estimate the difficulty and the discrimination of each question, as they are defined in Item Response Theory. We also present the results of extensive experiments we carried out on a real world large scale dataset coming from an e-learning platform, showing that our model can be used to perform an initial assessment of newly created questions and ease some of the problems that arise in question generation.
Tasks	Question Generation
Published	2020-01-21
URL	https://arxiv.org/abs/2001.07569v1
PDF	https://arxiv.org/pdf/2001.07569v1.pdf
PWC	https://paperswithcode.com/paper/r2de-a-nlp-approach-to-estimating-irt
Repo	https://github.com/lucabenedetto/r2de-nlp-to-estimating-irt-parameters
Framework	none