Paper Group AWR 199
Combining Machine Learning Models using combo Library. Diagnosing and Enhancing VAE Models. GNNExplainer: Generating Explanations for Graph Neural Networks. Analyzing Structures in the Semantic Vector Space: A Framework for Decomposing Word Embeddings. Massive vs. Curated Word Embeddings for Low-Resourced Languages. The Case of Yorùbá and Twi. Smar …
Combining Machine Learning Models using combo Library
Title | Combining Machine Learning Models using combo Library |
Authors | Yue Zhao, Xuejian Wang, Cheng Cheng, Xueying Ding |
Abstract | Model combination, often regarded as a key sub-field of ensemble learning, has been widely used in both academic research and industry applications. To facilitate this process, we propose and implement an easy-to-use Python toolkit, combo, to aggregate models and scores under various scenarios, including classification, clustering, and anomaly detection. In a nutshell, combo provides a unified and consistent way to combine both raw and pretrained models from popular machine learning libraries, e.g., scikit-learn, XGBoost, and LightGBM. With accessibility and robustness in mind, combo is designed with detailed documentation, interactive examples, continuous integration, code coverage, and maintainability check; it can be installed easily through Python Package Index (PyPI) or https://github.com/yzhao062/combo. |
Tasks | Anomaly Detection |
Published | 2019-09-21 |
URL | https://arxiv.org/abs/1910.07988v2 |
https://arxiv.org/pdf/1910.07988v2.pdf | |
PWC | https://paperswithcode.com/paper/combining-machine-learning-models-using-combo |
Repo | https://github.com/yzhao062/combo |
Framework | none |
Diagnosing and Enhancing VAE Models
Title | Diagnosing and Enhancing VAE Models |
Authors | Bin Dai, David Wipf |
Abstract | Although variational autoencoders (VAEs) represent a widely influential deep generative model, many aspects of the underlying energy function remain poorly understood. In particular, it is commonly believed that Gaussian encoder/decoder assumptions reduce the effectiveness of VAEs in generating realistic samples. In this regard, we rigorously analyze the VAE objective, differentiating situations where this belief is and is not actually true. We then leverage the corresponding insights to develop a simple VAE enhancement that requires no additional hyperparameters or sensitive tuning. Quantitatively, this proposal produces crisp samples and stable FID scores that are actually competitive with a variety of GAN models, all while retaining desirable attributes of the original VAE architecture. A shorter version of this work will appear in the ICLR 2019 conference proceedings (Dai and Wipf, 2019). The code for our model is available at https://github.com/daib13/ TwoStageVAE. |
Tasks | |
Published | 2019-03-14 |
URL | https://arxiv.org/abs/1903.05789v2 |
https://arxiv.org/pdf/1903.05789v2.pdf | |
PWC | https://paperswithcode.com/paper/diagnosing-and-enhancing-vae-models-1 |
Repo | https://github.com/vitskvara/GenModels.jl |
Framework | none |
GNNExplainer: Generating Explanations for Graph Neural Networks
Title | GNNExplainer: Generating Explanations for Graph Neural Networks |
Authors | Rex Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, Jure Leskovec |
Abstract | Graph Neural Networks (GNNs) are a powerful tool for machine learning on graphs.GNNs combine node feature information with the graph structure by recursively passing neural messages along edges of the input graph. However, incorporating both graph structure and feature information leads to complex models, and explaining predictions made by GNNs remains unsolved. Here we propose GNNExplainer, the first general, model-agnostic approach for providing interpretable explanations for predictions of any GNN-based model on any graph-based machine learning task. Given an instance, GNNExplainer identifies a compact subgraph structure and a small subset of node features that have a crucial role in GNN’s prediction. Further, GNNExplainer can generate consistent and concise explanations for an entire class of instances. We formulate GNNExplainer as an optimization task that maximizes the mutual information between a GNN’s prediction and distribution of possible subgraph structures. Experiments on synthetic and real-world graphs show that our approach can identify important graph structures as well as node features, and outperforms baselines by 17.1% on average. GNNExplainer provides a variety of benefits, from the ability to visualize semantically relevant structures to interpretability, to giving insights into errors of faulty GNNs. |
Tasks | Graph Classification, Link Prediction |
Published | 2019-03-10 |
URL | https://arxiv.org/abs/1903.03894v4 |
https://arxiv.org/pdf/1903.03894v4.pdf | |
PWC | https://paperswithcode.com/paper/gnn-explainer-a-tool-for-post-hoc-explanation |
Repo | https://github.com/RexYing/gnn-model-explainer |
Framework | pytorch |
Analyzing Structures in the Semantic Vector Space: A Framework for Decomposing Word Embeddings
Title | Analyzing Structures in the Semantic Vector Space: A Framework for Decomposing Word Embeddings |
Authors | Andreas Hanselowski, Iryna Gurevych |
Abstract | Word embeddings are rich word representations, which in combination with deep neural networks, lead to large performance gains for many NLP tasks. However, word embeddings are represented by dense, real-valued vectors and they are therefore not directly interpretable. Thus, computational operations based on them are also not well understood. In this paper, we present an approach for analyzing structures in the semantic vector space to get a better understanding of the underlying semantic encoding principles. We present a framework for decomposing word embeddings into smaller meaningful units which we call sub-vectors. The framework opens up a wide range of possibilities analyzing phenomena in vector space semantics, as well as solving concrete NLP problems: We introduce the category completion task and show that a sub-vector based approach is superior to supervised techniques; We present a sub-vector based method for solving the word analogy task, which substantially outperforms different variants of the traditional vector-offset method. |
Tasks | Word Embeddings |
Published | 2019-12-17 |
URL | https://arxiv.org/abs/1912.10434v1 |
https://arxiv.org/pdf/1912.10434v1.pdf | |
PWC | https://paperswithcode.com/paper/analyzing-structures-in-the-semantic-vector |
Repo | https://github.com/hanselowski/embedding_decomp |
Framework | none |
Massive vs. Curated Word Embeddings for Low-Resourced Languages. The Case of Yorùbá and Twi
Title | Massive vs. Curated Word Embeddings for Low-Resourced Languages. The Case of Yorùbá and Twi |
Authors | Jesujoba O. Alabi, Kwabena Amponsah-Kaakyire, David I. Adelani, Cristina España-Bonet |
Abstract | The success of several architectures to learn semantic representations from unannotated text and the availability of these kind of texts in online multilingual resources such as Wikipedia has facilitated the massive and automatic creation of resources for multiple languages. The evaluation of such resources is usually done for the high-resourced languages, where one has a smorgasbord of tasks and test sets to evaluate on. For low-resourced languages, the evaluation is more difficult and normally ignored, with the hope that the impressive capability of deep learning architectures to learn (multilingual) representations in the high-resourced setting holds in the low-resourced setting too. In this paper we focus on two African languages, Yor`ub'a and Twi, and compare the word embeddings obtained in this way, with word embeddings obtained from curated corpora and a language-dependent processing. We analyse the noise in the publicly available corpora, collect high quality and noisy data for the two languages and quantify the improvements that depend not only on the amount of data but on the quality too. We also use different architectures that learn word representations both from surface forms and characters to further exploit all the available information which showed to be important for these languages. For the evaluation, we manually translate the wordsim-353 word pairs dataset from English into Yor`ub'a and Twi. As output of the work, we provide corpora, embeddings and the test suits for both languages. |
Tasks | Word Embeddings |
Published | 2019-12-05 |
URL | https://arxiv.org/abs/1912.02481v2 |
https://arxiv.org/pdf/1912.02481v2.pdf | |
PWC | https://paperswithcode.com/paper/massive-vs-curated-word-embeddings-for-low |
Repo | https://github.com/ajesujoba/YorubaTwi-Embedding |
Framework | none |
Smart Home Appliances: Chat with Your Fridge
Title | Smart Home Appliances: Chat with Your Fridge |
Authors | Denis Gudovskiy, Gyuri Han, Takuya Yamaguchi, Sotaro Tsukizawa |
Abstract | Current home appliances are capable to execute a limited number of voice commands such as turning devices on or off, adjusting music volume or light conditions. Recent progress in machine reasoning gives an opportunity to develop new types of conversational user interfaces for home appliances. In this paper, we apply state-of-the-art visual reasoning model and demonstrate that it is feasible to ask a smart fridge about its contents and various properties of the food with close-to-natural conversation experience. Our visual reasoning model answers user questions about existence, count, category and freshness of each product by analyzing photos made by the image sensor inside the smart fridge. Users may chat with their fridge using off-the-shelf phone messenger while being away from home, for example, when shopping in the supermarket. We generate a visually realistic synthetic dataset to train machine learning reasoning model that achieves 95% answer accuracy on test data. We present the results of initial user tests and discuss how we modify distribution of generated questions for model training based on human-in-the-loop guidance. We open source code for the whole system including dataset generation, reasoning model and demonstration scripts. |
Tasks | Visual Reasoning |
Published | 2019-12-19 |
URL | https://arxiv.org/abs/1912.09589v1 |
https://arxiv.org/pdf/1912.09589v1.pdf | |
PWC | https://paperswithcode.com/paper/smart-home-appliances-chat-with-your-fridge |
Repo | https://github.com/gudovskiy/fridge-demo |
Framework | pytorch |
Component Attention Guided Face Super-Resolution Network: CAGFace
Title | Component Attention Guided Face Super-Resolution Network: CAGFace |
Authors | Ratheesh Kalarot, Tao Li, Fatih Porikli |
Abstract | To make the best use of the underlying structure of faces, the collective information through face datasets and the intermediate estimates during the upsampling process, here we introduce a fully convolutional multi-stage neural network for 4$\times$ super-resolution for face images. We implicitly impose facial component-wise attention maps using a segmentation network to allow our network to focus on face-inherent patterns. Each stage of our network is composed of a stem layer, a residual backbone, and spatial upsampling layers. We recurrently apply stages to reconstruct an intermediate image, and then reuse its space-to-depth converted versions to bootstrap and enhance image quality progressively. Our experiments show that our face super-resolution method achieves quantitatively superior and perceptually pleasing results in comparison to state of the art. |
Tasks | Super-Resolution |
Published | 2019-10-19 |
URL | https://arxiv.org/abs/1910.08761v1 |
https://arxiv.org/pdf/1910.08761v1.pdf | |
PWC | https://paperswithcode.com/paper/component-attention-guided-face-super |
Repo | https://github.com/SeungyounShin/CAGFace |
Framework | pytorch |
Deep-IRT: Make Deep Learning Based Knowledge Tracing Explainable Using Item Response Theory
Title | Deep-IRT: Make Deep Learning Based Knowledge Tracing Explainable Using Item Response Theory |
Authors | Chun-Kit Yeung |
Abstract | Deep learning based knowledge tracing model has been shown to outperform traditional knowledge tracing model without the need for human-engineered features, yet its parameters and representations have long been criticized for not being explainable. In this paper, we propose Deep-IRT which is a synthesis of the item response theory (IRT) model and a knowledge tracing model that is based on the deep neural network architecture called dynamic key-value memory network (DKVMN) to make deep learning based knowledge tracing explainable. Specifically, we use the DKVMN model to process the student’s learning trajectory and estimate the student ability level and the item difficulty level over time. Then, we use the IRT model to estimate the probability that a student will answer an item correctly using the estimated student ability and the item difficulty. Experiments show that the Deep-IRT model retains the performance of the DKVMN model, while it provides a direct psychological interpretation of both students and items. |
Tasks | Knowledge Tracing |
Published | 2019-04-26 |
URL | http://arxiv.org/abs/1904.11738v1 |
http://arxiv.org/pdf/1904.11738v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-irt-make-deep-learning-based-knowledge |
Repo | https://github.com/ckyeungac/DeepIRT |
Framework | tf |
Probabilistic Forecasting with Temporal Convolutional Neural Network
Title | Probabilistic Forecasting with Temporal Convolutional Neural Network |
Authors | Yitian Chen, Yanfei Kang, Yixiong Chen, Zizhuo Wang |
Abstract | We present a probabilistic forecasting framework based on convolutional neural network for multiple related time series forecasting. The framework can be applied to estimate probability density under both parametric and non-parametric settings. More specifically, stacked residual blocks based on dilated causal convolutional nets are constructed to capture the temporal dependencies of the series. Combined with representation learning, our approach is able to learn complex patterns such as seasonality, holiday effects within and across series, and to leverage those patterns for more accurate forecasts, especially when historical data is sparse or unavailable. Extensive empirical studies are performed on several real-world datasets, including datasets from JD.com, China’s largest online retailer. The results show that our framework outperforms other state-of-the-art methods in both accuracy and efficiency. |
Tasks | Representation Learning, Time Series, Time Series Forecasting |
Published | 2019-06-11 |
URL | https://arxiv.org/abs/1906.04397v3 |
https://arxiv.org/pdf/1906.04397v3.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-forecasting-with-temporal |
Repo | https://github.com/oneday88/kdd2019deepTCN |
Framework | mxnet |
Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks
Title | Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks |
Authors | Joanna Materzynska, Tete Xiao, Roei Herzig, Huijuan Xu, Xiaolong Wang, Trevor Darrell |
Abstract | Human action is naturally compositional: humans can easily recognize and perform actions with objects that are different from those used in training demonstrations. In this paper, we study the compositionality of action by looking into the dynamics of subject-object interactions. We propose a novel model which can explicitly reason about the geometric relations between constituent objects and an agent performing an action. To train our model, we collect dense object box annotations on the Something-Something dataset. We propose a novel compositional action recognition task where the training combinations of verbs and nouns do not overlap with the test set. The novel aspects of our model are applicable to activities with prominent object interaction dynamics and to objects which can be tracked using state-of-the-art approaches; for activities without clearly defined spatial object-agent interactions, we rely on baseline scene-level spatio-temporal representations. We show the effectiveness of our approach not only on the proposed compositional action recognition task, but also in a few-shot compositional setting which requires the model to generalize across both object appearance and action category. |
Tasks | |
Published | 2019-12-20 |
URL | https://arxiv.org/abs/1912.09930v1 |
https://arxiv.org/pdf/1912.09930v1.pdf | |
PWC | https://paperswithcode.com/paper/something-else-compositional-action |
Repo | https://github.com/joaanna/something_else |
Framework | none |
Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets
Title | Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets |
Authors | Fanchao Qi, Liang Chang, Maosong Sun, Sicong Ouyang, Zhiyuan Liu |
Abstract | A sememe is defined as the minimum semantic unit of human languages. Sememe knowledge bases (KBs), which contain words annotated with sememes, have been successfully applied to many NLP tasks. However, existing sememe KBs are built on only a few languages, which hinders their widespread utilization. To address the issue, we propose to build a unified sememe KB for multiple languages based on BabelNet, a multilingual encyclopedic dictionary. We first build a dataset serving as the seed of the multilingual sememe KB. It manually annotates sememes for over $15$ thousand synsets (the entries of BabelNet). Then, we present a novel task of automatic sememe prediction for synsets, aiming to expand the seed dataset into a usable KB. We also propose two simple and effective models, which exploit different information of synsets. Finally, we conduct quantitative and qualitative analyses to explore important factors and difficulties in the task. All the source code and data of this work can be obtained on https://github.com/thunlp/BabelNet-Sememe-Prediction. |
Tasks | |
Published | 2019-12-04 |
URL | https://arxiv.org/abs/1912.01795v1 |
https://arxiv.org/pdf/1912.01795v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-building-a-multilingual-sememe |
Repo | https://github.com/thunlp/BabelNet-Sememe-Prediction |
Framework | tf |
Deep Co-Training for Semi-Supervised Image Segmentation
Title | Deep Co-Training for Semi-Supervised Image Segmentation |
Authors | Jizong Peng, Guillermo Estrada, Marco Pedersoli, Christian Desrosiers |
Abstract | In this paper, we aim to improve the performance of semantic image segmentation in a semi-supervised setting in which training is effectuated with a reduced set of annotated images and additional non-annotated images. We present a method based on an ensemble of deep segmentation models. Each model is trained on a subset of the annotated data, and uses the non-annotated images to exchange information with the other models, similar to co-training. Even if each model learns on the same non-annotated images, diversity is preserved with the use of adversarial samples. Our results show that this ability to simultaneously train models, which exchange knowledge while preserving diversity, leads to state-of-the-art results on two challenging medical image datasets. |
Tasks | Semantic Segmentation |
Published | 2019-03-27 |
URL | https://arxiv.org/abs/1903.11233v3 |
https://arxiv.org/pdf/1903.11233v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-co-training-for-semi-supervised-image-2 |
Repo | https://github.com/jizongFox/deep-clustering-toolbox |
Framework | pytorch |
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Title | VL-BERT: Pre-training of Generic Visual-Linguistic Representations |
Authors | Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai |
Abstract | We introduce a new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT for short). VL-BERT adopts the simple yet powerful Transformer model as the backbone, and extends it to take both visual and linguistic embedded features as input. In it, each element of the input is either of a word from the input sentence, or a region-of-interest (RoI) from the input image. It is designed to fit for most of the visual-linguistic downstream tasks. To better exploit the generic representation, we pre-train VL-BERT on the massive-scale Conceptual Captions dataset, together with text-only corpus. Extensive empirical analysis demonstrates that the pre-training procedure can better align the visual-linguistic clues and benefit the downstream tasks, such as visual commonsense reasoning, visual question answering and referring expression comprehension. It is worth noting that VL-BERT achieved the first place of single model on the leaderboard of the VCR benchmark. Code is released at \url{https://github.com/jackroos/VL-BERT}. |
Tasks | Language Modelling, Question Answering, Visual Commonsense Reasoning, Visual Question Answering |
Published | 2019-08-22 |
URL | https://arxiv.org/abs/1908.08530v4 |
https://arxiv.org/pdf/1908.08530v4.pdf | |
PWC | https://paperswithcode.com/paper/vl-bert-pre-training-of-generic-visual |
Repo | https://github.com/jackroos/VL-BERT |
Framework | pytorch |
Vision-based inspection system employing computer vision & neural networks for detection of fractures in manufactured components
Title | Vision-based inspection system employing computer vision & neural networks for detection of fractures in manufactured components |
Authors | Sarthak J Shetty |
Abstract | We are proceeding towards the age of automation and robotic integration of our production lines [5]. Effective quality-control systems have to be put in place to maintain the quality of manufactured components. Among different quality-control systems, vision-based inspection systems have gained considerable amount of popularity [8] due to developments in computing power and image processing techniques. In this paper, we present a vision-based inspection system (VBI) as a quality-control system, which not only detects the presence of defects, such as in conventional VBIs, but also leverage developments in machine learning to predict the presence of surface fractures and wearing. We use OpenCV, an open source computer-vision framework, and Tensorflow, an open source machine-learning framework developed by Google Inc., to accomplish the tasks of detection and prediction of presence of surface defects such as fractures of manufactured gears. |
Tasks | |
Published | 2019-01-25 |
URL | http://arxiv.org/abs/1901.08864v1 |
http://arxiv.org/pdf/1901.08864v1.pdf | |
PWC | https://paperswithcode.com/paper/vision-based-inspection-system-employing |
Repo | https://github.com/SarthakJShetty/Fracture |
Framework | tf |
Viewpoint-Aware Loss with Angular Regularization for Person Re-Identification
Title | Viewpoint-Aware Loss with Angular Regularization for Person Re-Identification |
Authors | Zhihui Zhu, Xinyang Jiang, Feng Zheng, Xiaowei Guo, Feiyue Huang, Weishi Zheng, Xing Sun |
Abstract | Although great progress in supervised person re-identification (Re-ID) has been made recently, due to the viewpoint variation of a person, Re-ID remains a massive visual challenge. Most existing viewpoint-based person Re-ID methods project images from each viewpoint into separated and unrelated sub-feature spaces. They only model the identity-level distribution inside an individual viewpoint but ignore the underlying relationship between different viewpoints. To address this problem, we propose a novel approach, called \textit{Viewpoint-Aware Loss with Angular Regularization }(\textbf{VA-reID}). Instead of one subspace for each viewpoint, our method projects the feature from different viewpoints into a unified hypersphere and effectively models the feature distribution on both the identity-level and the viewpoint-level. In addition, rather than modeling different viewpoints as hard labels used for conventional viewpoint classification, we introduce viewpoint-aware adaptive label smoothing regularization (VALSR) that assigns the adaptive soft label to feature representation. VALSR can effectively solve the ambiguity of the viewpoint cluster label assignment. Extensive experiments on the Market1501 and DukeMTMC-reID datasets demonstrated that our method outperforms the state-of-the-art supervised Re-ID methods. |
Tasks | Person Re-Identification |
Published | 2019-12-03 |
URL | https://arxiv.org/abs/1912.01300v1 |
https://arxiv.org/pdf/1912.01300v1.pdf | |
PWC | https://paperswithcode.com/paper/viewpoint-aware-loss-with-angular |
Repo | https://github.com/zzhsysu/VA-ReID |
Framework | none |