February 1, 2020

3037 words 15 mins read

Paper Group AWR 199

Combining Machine Learning Models using combo Library. Diagnosing and Enhancing VAE Models. GNNExplainer: Generating Explanations for Graph Neural Networks. Analyzing Structures in the Semantic Vector Space: A Framework for Decomposing Word Embeddings. Massive vs. Curated Word Embeddings for Low-Resourced Languages. The Case of Yorùbá and Twi. Smar …

Combining Machine Learning Models using combo Library


Title	Combining Machine Learning Models using combo Library
Authors	Yue Zhao, Xuejian Wang, Cheng Cheng, Xueying Ding
Abstract	Model combination, often regarded as a key sub-field of ensemble learning, has been widely used in both academic research and industry applications. To facilitate this process, we propose and implement an easy-to-use Python toolkit, combo, to aggregate models and scores under various scenarios, including classification, clustering, and anomaly detection. In a nutshell, combo provides a unified and consistent way to combine both raw and pretrained models from popular machine learning libraries, e.g., scikit-learn, XGBoost, and LightGBM. With accessibility and robustness in mind, combo is designed with detailed documentation, interactive examples, continuous integration, code coverage, and maintainability check; it can be installed easily through Python Package Index (PyPI) or https://github.com/yzhao062/combo.
Tasks	Anomaly Detection
Published	2019-09-21
URL	https://arxiv.org/abs/1910.07988v2
PDF	https://arxiv.org/pdf/1910.07988v2.pdf
PWC	https://paperswithcode.com/paper/combining-machine-learning-models-using-combo
Repo	https://github.com/yzhao062/combo
Framework	none

Diagnosing and Enhancing VAE Models


Title	Diagnosing and Enhancing VAE Models
Authors	Bin Dai, David Wipf
Abstract	Although variational autoencoders (VAEs) represent a widely influential deep generative model, many aspects of the underlying energy function remain poorly understood. In particular, it is commonly believed that Gaussian encoder/decoder assumptions reduce the effectiveness of VAEs in generating realistic samples. In this regard, we rigorously analyze the VAE objective, differentiating situations where this belief is and is not actually true. We then leverage the corresponding insights to develop a simple VAE enhancement that requires no additional hyperparameters or sensitive tuning. Quantitatively, this proposal produces crisp samples and stable FID scores that are actually competitive with a variety of GAN models, all while retaining desirable attributes of the original VAE architecture. A shorter version of this work will appear in the ICLR 2019 conference proceedings (Dai and Wipf, 2019). The code for our model is available at https://github.com/daib13/ TwoStageVAE.
Tasks
Published	2019-03-14
URL	https://arxiv.org/abs/1903.05789v2
PDF	https://arxiv.org/pdf/1903.05789v2.pdf
PWC	https://paperswithcode.com/paper/diagnosing-and-enhancing-vae-models-1
Repo	https://github.com/vitskvara/GenModels.jl
Framework	none

GNNExplainer: Generating Explanations for Graph Neural Networks


Title	GNNExplainer: Generating Explanations for Graph Neural Networks
Authors	Rex Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, Jure Leskovec
Abstract	Graph Neural Networks (GNNs) are a powerful tool for machine learning on graphs.GNNs combine node feature information with the graph structure by recursively passing neural messages along edges of the input graph. However, incorporating both graph structure and feature information leads to complex models, and explaining predictions made by GNNs remains unsolved. Here we propose GNNExplainer, the first general, model-agnostic approach for providing interpretable explanations for predictions of any GNN-based model on any graph-based machine learning task. Given an instance, GNNExplainer identifies a compact subgraph structure and a small subset of node features that have a crucial role in GNN’s prediction. Further, GNNExplainer can generate consistent and concise explanations for an entire class of instances. We formulate GNNExplainer as an optimization task that maximizes the mutual information between a GNN’s prediction and distribution of possible subgraph structures. Experiments on synthetic and real-world graphs show that our approach can identify important graph structures as well as node features, and outperforms baselines by 17.1% on average. GNNExplainer provides a variety of benefits, from the ability to visualize semantically relevant structures to interpretability, to giving insights into errors of faulty GNNs.
Tasks	Graph Classification, Link Prediction
Published	2019-03-10
URL	https://arxiv.org/abs/1903.03894v4
PDF	https://arxiv.org/pdf/1903.03894v4.pdf
PWC	https://paperswithcode.com/paper/gnn-explainer-a-tool-for-post-hoc-explanation
Repo	https://github.com/RexYing/gnn-model-explainer
Framework	pytorch

Analyzing Structures in the Semantic Vector Space: A Framework for Decomposing Word Embeddings


Title	Analyzing Structures in the Semantic Vector Space: A Framework for Decomposing Word Embeddings
Authors	Andreas Hanselowski, Iryna Gurevych
Abstract	Word embeddings are rich word representations, which in combination with deep neural networks, lead to large performance gains for many NLP tasks. However, word embeddings are represented by dense, real-valued vectors and they are therefore not directly interpretable. Thus, computational operations based on them are also not well understood. In this paper, we present an approach for analyzing structures in the semantic vector space to get a better understanding of the underlying semantic encoding principles. We present a framework for decomposing word embeddings into smaller meaningful units which we call sub-vectors. The framework opens up a wide range of possibilities analyzing phenomena in vector space semantics, as well as solving concrete NLP problems: We introduce the category completion task and show that a sub-vector based approach is superior to supervised techniques; We present a sub-vector based method for solving the word analogy task, which substantially outperforms different variants of the traditional vector-offset method.
Tasks	Word Embeddings
Published	2019-12-17
URL	https://arxiv.org/abs/1912.10434v1
PDF	https://arxiv.org/pdf/1912.10434v1.pdf
PWC	https://paperswithcode.com/paper/analyzing-structures-in-the-semantic-vector
Repo	https://github.com/hanselowski/embedding_decomp
Framework	none

Massive vs. Curated Word Embeddings for Low-Resourced Languages. The Case of Yorùbá and Twi


Title	Massive vs. Curated Word Embeddings for Low-Resourced Languages. The Case of Yorùbá and Twi
Authors	Jesujoba O. Alabi, Kwabena Amponsah-Kaakyire, David I. Adelani, Cristina España-Bonet
Abstract	The success of several architectures to learn semantic representations from unannotated text and the availability of these kind of texts in online multilingual resources such as Wikipedia has facilitated the massive and automatic creation of resources for multiple languages. The evaluation of such resources is usually done for the high-resourced languages, where one has a smorgasbord of tasks and test sets to evaluate on. For low-resourced languages, the evaluation is more difficult and normally ignored, with the hope that the impressive capability of deep learning architectures to learn (multilingual) representations in the high-resourced setting holds in the low-resourced setting too. In this paper we focus on two African languages, Yor`ub'a and Twi, and compare the word embeddings obtained in this way, with word embeddings obtained from curated corpora and a language-dependent processing. We analyse the noise in the publicly available corpora, collect high quality and noisy data for the two languages and quantify the improvements that depend not only on the amount of data but on the quality too. We also use different architectures that learn word representations both from surface forms and characters to further exploit all the available information which showed to be important for these languages. For the evaluation, we manually translate the wordsim-353 word pairs dataset from English into Yor`ub'a and Twi. As output of the work, we provide corpora, embeddings and the test suits for both languages.
Tasks	Word Embeddings
Published	2019-12-05
URL	https://arxiv.org/abs/1912.02481v2
PDF	https://arxiv.org/pdf/1912.02481v2.pdf
PWC	https://paperswithcode.com/paper/massive-vs-curated-word-embeddings-for-low
Repo	https://github.com/ajesujoba/YorubaTwi-Embedding
Framework	none

Smart Home Appliances: Chat with Your Fridge


Title	Smart Home Appliances: Chat with Your Fridge
Authors	Denis Gudovskiy, Gyuri Han, Takuya Yamaguchi, Sotaro Tsukizawa
Abstract	Current home appliances are capable to execute a limited number of voice commands such as turning devices on or off, adjusting music volume or light conditions. Recent progress in machine reasoning gives an opportunity to develop new types of conversational user interfaces for home appliances. In this paper, we apply state-of-the-art visual reasoning model and demonstrate that it is feasible to ask a smart fridge about its contents and various properties of the food with close-to-natural conversation experience. Our visual reasoning model answers user questions about existence, count, category and freshness of each product by analyzing photos made by the image sensor inside the smart fridge. Users may chat with their fridge using off-the-shelf phone messenger while being away from home, for example, when shopping in the supermarket. We generate a visually realistic synthetic dataset to train machine learning reasoning model that achieves 95% answer accuracy on test data. We present the results of initial user tests and discuss how we modify distribution of generated questions for model training based on human-in-the-loop guidance. We open source code for the whole system including dataset generation, reasoning model and demonstration scripts.
Tasks	Visual Reasoning
Published	2019-12-19
URL	https://arxiv.org/abs/1912.09589v1
PDF	https://arxiv.org/pdf/1912.09589v1.pdf
PWC	https://paperswithcode.com/paper/smart-home-appliances-chat-with-your-fridge
Repo	https://github.com/gudovskiy/fridge-demo
Framework	pytorch

Component Attention Guided Face Super-Resolution Network: CAGFace


Title	Component Attention Guided Face Super-Resolution Network: CAGFace
Authors	Ratheesh Kalarot, Tao Li, Fatih Porikli
Abstract	To make the best use of the underlying structure of faces, the collective information through face datasets and the intermediate estimates during the upsampling process, here we introduce a fully convolutional multi-stage neural network for 4$\times$ super-resolution for face images. We implicitly impose facial component-wise attention maps using a segmentation network to allow our network to focus on face-inherent patterns. Each stage of our network is composed of a stem layer, a residual backbone, and spatial upsampling layers. We recurrently apply stages to reconstruct an intermediate image, and then reuse its space-to-depth converted versions to bootstrap and enhance image quality progressively. Our experiments show that our face super-resolution method achieves quantitatively superior and perceptually pleasing results in comparison to state of the art.
Tasks	Super-Resolution
Published	2019-10-19
URL	https://arxiv.org/abs/1910.08761v1
PDF	https://arxiv.org/pdf/1910.08761v1.pdf
PWC	https://paperswithcode.com/paper/component-attention-guided-face-super
Repo	https://github.com/SeungyounShin/CAGFace
Framework	pytorch

Deep-IRT: Make Deep Learning Based Knowledge Tracing Explainable Using Item Response Theory


Title	Deep-IRT: Make Deep Learning Based Knowledge Tracing Explainable Using Item Response Theory
Authors	Chun-Kit Yeung
Abstract	Deep learning based knowledge tracing model has been shown to outperform traditional knowledge tracing model without the need for human-engineered features, yet its parameters and representations have long been criticized for not being explainable. In this paper, we propose Deep-IRT which is a synthesis of the item response theory (IRT) model and a knowledge tracing model that is based on the deep neural network architecture called dynamic key-value memory network (DKVMN) to make deep learning based knowledge tracing explainable. Specifically, we use the DKVMN model to process the student’s learning trajectory and estimate the student ability level and the item difficulty level over time. Then, we use the IRT model to estimate the probability that a student will answer an item correctly using the estimated student ability and the item difficulty. Experiments show that the Deep-IRT model retains the performance of the DKVMN model, while it provides a direct psychological interpretation of both students and items.
Tasks	Knowledge Tracing
Published	2019-04-26
URL	http://arxiv.org/abs/1904.11738v1
PDF	http://arxiv.org/pdf/1904.11738v1.pdf
PWC	https://paperswithcode.com/paper/deep-irt-make-deep-learning-based-knowledge
Repo	https://github.com/ckyeungac/DeepIRT
Framework	tf

Probabilistic Forecasting with Temporal Convolutional Neural Network


Title	Probabilistic Forecasting with Temporal Convolutional Neural Network
Authors	Yitian Chen, Yanfei Kang, Yixiong Chen, Zizhuo Wang
Abstract	We present a probabilistic forecasting framework based on convolutional neural network for multiple related time series forecasting. The framework can be applied to estimate probability density under both parametric and non-parametric settings. More specifically, stacked residual blocks based on dilated causal convolutional nets are constructed to capture the temporal dependencies of the series. Combined with representation learning, our approach is able to learn complex patterns such as seasonality, holiday effects within and across series, and to leverage those patterns for more accurate forecasts, especially when historical data is sparse or unavailable. Extensive empirical studies are performed on several real-world datasets, including datasets from JD.com, China’s largest online retailer. The results show that our framework outperforms other state-of-the-art methods in both accuracy and efficiency.
Tasks	Representation Learning, Time Series, Time Series Forecasting
Published	2019-06-11
URL	https://arxiv.org/abs/1906.04397v3
PDF	https://arxiv.org/pdf/1906.04397v3.pdf
PWC	https://paperswithcode.com/paper/probabilistic-forecasting-with-temporal
Repo	https://github.com/oneday88/kdd2019deepTCN
Framework	mxnet

Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks


Title	Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks
Authors	Joanna Materzynska, Tete Xiao, Roei Herzig, Huijuan Xu, Xiaolong Wang, Trevor Darrell
Abstract	Human action is naturally compositional: humans can easily recognize and perform actions with objects that are different from those used in training demonstrations. In this paper, we study the compositionality of action by looking into the dynamics of subject-object interactions. We propose a novel model which can explicitly reason about the geometric relations between constituent objects and an agent performing an action. To train our model, we collect dense object box annotations on the Something-Something dataset. We propose a novel compositional action recognition task where the training combinations of verbs and nouns do not overlap with the test set. The novel aspects of our model are applicable to activities with prominent object interaction dynamics and to objects which can be tracked using state-of-the-art approaches; for activities without clearly defined spatial object-agent interactions, we rely on baseline scene-level spatio-temporal representations. We show the effectiveness of our approach not only on the proposed compositional action recognition task, but also in a few-shot compositional setting which requires the model to generalize across both object appearance and action category.
Tasks
Published	2019-12-20
URL	https://arxiv.org/abs/1912.09930v1
PDF	https://arxiv.org/pdf/1912.09930v1.pdf
PWC	https://paperswithcode.com/paper/something-else-compositional-action
Repo	https://github.com/joaanna/something_else
Framework	none

Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets


Title	Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets
Authors	Fanchao Qi, Liang Chang, Maosong Sun, Sicong Ouyang, Zhiyuan Liu
Abstract	A sememe is defined as the minimum semantic unit of human languages. Sememe knowledge bases (KBs), which contain words annotated with sememes, have been successfully applied to many NLP tasks. However, existing sememe KBs are built on only a few languages, which hinders their widespread utilization. To address the issue, we propose to build a unified sememe KB for multiple languages based on BabelNet, a multilingual encyclopedic dictionary. We first build a dataset serving as the seed of the multilingual sememe KB. It manually annotates sememes for over $15$ thousand synsets (the entries of BabelNet). Then, we present a novel task of automatic sememe prediction for synsets, aiming to expand the seed dataset into a usable KB. We also propose two simple and effective models, which exploit different information of synsets. Finally, we conduct quantitative and qualitative analyses to explore important factors and difficulties in the task. All the source code and data of this work can be obtained on https://github.com/thunlp/BabelNet-Sememe-Prediction.
Tasks
Published	2019-12-04
URL	https://arxiv.org/abs/1912.01795v1
PDF	https://arxiv.org/pdf/1912.01795v1.pdf
PWC	https://paperswithcode.com/paper/towards-building-a-multilingual-sememe
Repo	https://github.com/thunlp/BabelNet-Sememe-Prediction
Framework	tf

Deep Co-Training for Semi-Supervised Image Segmentation


Title	Deep Co-Training for Semi-Supervised Image Segmentation
Authors	Jizong Peng, Guillermo Estrada, Marco Pedersoli, Christian Desrosiers
Abstract	In this paper, we aim to improve the performance of semantic image segmentation in a semi-supervised setting in which training is effectuated with a reduced set of annotated images and additional non-annotated images. We present a method based on an ensemble of deep segmentation models. Each model is trained on a subset of the annotated data, and uses the non-annotated images to exchange information with the other models, similar to co-training. Even if each model learns on the same non-annotated images, diversity is preserved with the use of adversarial samples. Our results show that this ability to simultaneously train models, which exchange knowledge while preserving diversity, leads to state-of-the-art results on two challenging medical image datasets.
Tasks	Semantic Segmentation
Published	2019-03-27
URL	https://arxiv.org/abs/1903.11233v3
PDF	https://arxiv.org/pdf/1903.11233v3.pdf
PWC	https://paperswithcode.com/paper/deep-co-training-for-semi-supervised-image-2
Repo	https://github.com/jizongFox/deep-clustering-toolbox
Framework	pytorch

VL-BERT: Pre-training of Generic Visual-Linguistic Representations


Title	VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Authors	Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai
Abstract	We introduce a new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT for short). VL-BERT adopts the simple yet powerful Transformer model as the backbone, and extends it to take both visual and linguistic embedded features as input. In it, each element of the input is either of a word from the input sentence, or a region-of-interest (RoI) from the input image. It is designed to fit for most of the visual-linguistic downstream tasks. To better exploit the generic representation, we pre-train VL-BERT on the massive-scale Conceptual Captions dataset, together with text-only corpus. Extensive empirical analysis demonstrates that the pre-training procedure can better align the visual-linguistic clues and benefit the downstream tasks, such as visual commonsense reasoning, visual question answering and referring expression comprehension. It is worth noting that VL-BERT achieved the first place of single model on the leaderboard of the VCR benchmark. Code is released at \url{https://github.com/jackroos/VL-BERT}.
Tasks	Language Modelling, Question Answering, Visual Commonsense Reasoning, Visual Question Answering
Published	2019-08-22
URL	https://arxiv.org/abs/1908.08530v4
PDF	https://arxiv.org/pdf/1908.08530v4.pdf
PWC	https://paperswithcode.com/paper/vl-bert-pre-training-of-generic-visual
Repo	https://github.com/jackroos/VL-BERT
Framework	pytorch

Vision-based inspection system employing computer vision & neural networks for detection of fractures in manufactured components


Title	Vision-based inspection system employing computer vision & neural networks for detection of fractures in manufactured components
Authors	Sarthak J Shetty
Abstract	We are proceeding towards the age of automation and robotic integration of our production lines [5]. Effective quality-control systems have to be put in place to maintain the quality of manufactured components. Among different quality-control systems, vision-based inspection systems have gained considerable amount of popularity [8] due to developments in computing power and image processing techniques. In this paper, we present a vision-based inspection system (VBI) as a quality-control system, which not only detects the presence of defects, such as in conventional VBIs, but also leverage developments in machine learning to predict the presence of surface fractures and wearing. We use OpenCV, an open source computer-vision framework, and Tensorflow, an open source machine-learning framework developed by Google Inc., to accomplish the tasks of detection and prediction of presence of surface defects such as fractures of manufactured gears.
Tasks
Published	2019-01-25
URL	http://arxiv.org/abs/1901.08864v1
PDF	http://arxiv.org/pdf/1901.08864v1.pdf
PWC	https://paperswithcode.com/paper/vision-based-inspection-system-employing
Repo	https://github.com/SarthakJShetty/Fracture
Framework	tf

Viewpoint-Aware Loss with Angular Regularization for Person Re-Identification


Title	Viewpoint-Aware Loss with Angular Regularization for Person Re-Identification
Authors	Zhihui Zhu, Xinyang Jiang, Feng Zheng, Xiaowei Guo, Feiyue Huang, Weishi Zheng, Xing Sun
Abstract	Although great progress in supervised person re-identification (Re-ID) has been made recently, due to the viewpoint variation of a person, Re-ID remains a massive visual challenge. Most existing viewpoint-based person Re-ID methods project images from each viewpoint into separated and unrelated sub-feature spaces. They only model the identity-level distribution inside an individual viewpoint but ignore the underlying relationship between different viewpoints. To address this problem, we propose a novel approach, called \textit{Viewpoint-Aware Loss with Angular Regularization }(\textbf{VA-reID}). Instead of one subspace for each viewpoint, our method projects the feature from different viewpoints into a unified hypersphere and effectively models the feature distribution on both the identity-level and the viewpoint-level. In addition, rather than modeling different viewpoints as hard labels used for conventional viewpoint classification, we introduce viewpoint-aware adaptive label smoothing regularization (VALSR) that assigns the adaptive soft label to feature representation. VALSR can effectively solve the ambiguity of the viewpoint cluster label assignment. Extensive experiments on the Market1501 and DukeMTMC-reID datasets demonstrated that our method outperforms the state-of-the-art supervised Re-ID methods.
Tasks	Person Re-Identification
Published	2019-12-03
URL	https://arxiv.org/abs/1912.01300v1
PDF	https://arxiv.org/pdf/1912.01300v1.pdf
PWC	https://paperswithcode.com/paper/viewpoint-aware-loss-with-angular
Repo	https://github.com/zzhsysu/VA-ReID
Framework	none