Paper Group AWR 348
$d$-SNE: Domain Adaptation using Stochastic Neighborhood Embedding. How to Build User Simulators to Train RL-based Dialog Systems. Neural Ideal Point Estimation Network. On Tree-based Methods for Similarity Learning. Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal 3D Object Detection. Contrastive Variational Auto …
$d$-SNE: Domain Adaptation using Stochastic Neighborhood Embedding
Title | $d$-SNE: Domain Adaptation using Stochastic Neighborhood Embedding |
Authors | Xiang Xu, Xiong Zhou, Ragav Venkatesan, Gurumurthy Swaminathan, Orchid Majumder |
Abstract | Deep neural networks often require copious amount of labeled-data to train their scads of parameters. Training larger and deeper networks is hard without appropriate regularization, particularly while using a small dataset. Laterally, collecting well-annotated data is expensive, time-consuming and often infeasible. A popular way to regularize these networks is to simply train the network with more data from an alternate representative dataset. This can lead to adverse effects if the statistics of the representative dataset are dissimilar to our target. This predicament is due to the problem of domain shift. Data from a shifted domain might not produce bespoke features when a feature extractor from the representative domain is used. In this paper, we propose a new technique ($d$-SNE) of domain adaptation that cleverly uses stochastic neighborhood embedding techniques and a novel modified-Hausdorff distance. The proposed technique is learnable end-to-end and is therefore, ideally suited to train neural networks. Extensive experiments demonstrate that $d$-SNE outperforms the current states-of-the-art and is robust to the variances in different datasets, even in the one-shot and semi-supervised learning settings. $d$-SNE also demonstrates the ability to generalize to multiple domains concurrently. |
Tasks | Domain Adaptation |
Published | 2019-05-29 |
URL | https://arxiv.org/abs/1905.12775v1 |
https://arxiv.org/pdf/1905.12775v1.pdf | |
PWC | https://paperswithcode.com/paper/d-sne-domain-adaptation-using-stochastic-1 |
Repo | https://github.com/sheikhomar/d-SNE |
Framework | mxnet |
How to Build User Simulators to Train RL-based Dialog Systems
Title | How to Build User Simulators to Train RL-based Dialog Systems |
Authors | Weiyan Shi, Kun Qian, Xuewei Wang, Zhou Yu |
Abstract | User simulators are essential for training reinforcement learning (RL) based dialog models. The performance of the simulator directly impacts the RL policy. However, building a good user simulator that models real user behaviors is challenging. We propose a method of standardizing user simulator building that can be used by the community to compare dialog system quality using the same set of user simulators fairly. We present implementations of six user simulators trained with different dialog planning and generation methods. We then calculate a set of automatic metrics to evaluate the quality of these simulators both directly and indirectly. We also ask human users to assess the simulators directly and indirectly by rating the simulated dialogs and interacting with the trained systems. This paper presents a comprehensive evaluation framework for user simulator study and provides a better understanding of the pros and cons of different user simulators, as well as their impacts on the trained systems. |
Tasks | |
Published | 2019-09-03 |
URL | https://arxiv.org/abs/1909.01388v1 |
https://arxiv.org/pdf/1909.01388v1.pdf | |
PWC | https://paperswithcode.com/paper/how-to-build-user-simulators-to-train-rl |
Repo | https://github.com/wyshi/user-simulator |
Framework | pytorch |
Neural Ideal Point Estimation Network
Title | Neural Ideal Point Estimation Network |
Authors | Kyungwoo Song, Wonsung Lee, Il-Chul Moon |
Abstract | Understanding politics is challenging because the politics take the influence from everything. Even we limit ourselves to the political context in the legislative processes; we need a better understanding of latent factors, such as legislators, bills, their ideal points, and their relations. From the modeling perspective, this is difficult 1) because these observations lie in a high dimension that requires learning on low dimensional representations, and 2) because these observations require complex probabilistic modeling with latent variables to reflect the causalities. This paper presents a new model to reflect and understand this political setting, NIPEN, including factors mentioned above in the legislation. We propose two versions of NIPEN: one is a hybrid model of deep learning and probabilistic graphical model, and the other model is a neural tensor model. Our result indicates that NIPEN successfully learns the manifold of the legislative bill texts, and NIPEN utilizes the learned low-dimensional latent variables to increase the prediction performance of legislators’ votings. Additionally, by virtue of being a domain-rich probabilistic model, NIPEN shows the hidden strength of the legislators’ trust network and their various characteristics on casting votes. |
Tasks | |
Published | 2019-04-26 |
URL | http://arxiv.org/abs/1904.11727v1 |
http://arxiv.org/pdf/1904.11727v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-ideal-point-estimation-network |
Repo | https://github.com/gtshs2/NIPEN |
Framework | tf |
On Tree-based Methods for Similarity Learning
Title | On Tree-based Methods for Similarity Learning |
Authors | Stéphan Clémençon, Robin Vogel |
Abstract | In many situations, the choice of an adequate similarity measure or metric on the feature space dramatically determines the performance of machine learning methods. Building automatically such measures is the specific purpose of metric/similarity learning. In Vogel et al. (2018), similarity learning is formulated as a pairwise bipartite ranking problem: ideally, the larger the probability that two observations in the feature space belong to the same class (or share the same label), the higher the similarity measure between them. From this perspective, the ROC curve is an appropriate performance criterion and it is the goal of this article to extend recursive tree-based ROC optimization techniques in order to propose efficient similarity learning algorithms. The validity of such iterative partitioning procedures in the pairwise setting is established by means of results pertaining to the theory of U-processes and from a practical angle, it is discussed at length how to implement them by means of splitting rules specifically tailored to the similarity learning task. Beyond these theoretical/methodological contributions, numerical experiments are displayed and provide strong empirical evidence of the performance of the algorithmic approaches we propose. |
Tasks | |
Published | 2019-06-21 |
URL | https://arxiv.org/abs/1906.09243v1 |
https://arxiv.org/pdf/1906.09243v1.pdf | |
PWC | https://paperswithcode.com/paper/on-tree-based-methods-for-similarity-learning |
Repo | https://github.com/RobinVogel/On-Tree-based-methods-for-Similarity-Learning |
Framework | none |
Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal 3D Object Detection
Title | Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal 3D Object Detection |
Authors | Zhixin Wang, Kui Jia |
Abstract | In this work, we propose a novel method termed \emph{Frustum ConvNet (F-ConvNet)} for amodal 3D object detection from point clouds. Given 2D region proposals in an RGB image, our method first generates a sequence of frustums for each region proposal, and uses the obtained frustums to group local points. F-ConvNet aggregates point-wise features as frustum-level feature vectors, and arrays these feature vectors as a feature map for use of its subsequent component of fully convolutional network (FCN), which spatially fuses frustum-level features and supports an end-to-end and continuous estimation of oriented boxes in the 3D space. We also propose component variants of F-ConvNet, including an FCN variant that extracts multi-resolution frustum features, and a refined use of F-ConvNet over a reduced 3D space. Careful ablation studies verify the efficacy of these component variants. F-ConvNet assumes no prior knowledge of the working 3D environment and is thus dataset-agnostic. We present experiments on both the indoor SUN-RGBD and outdoor KITTI datasets. F-ConvNet outperforms all existing methods on SUN-RGBD, and at the time of submission it outperforms all published works on the KITTI benchmark. Code has been made available at: {\url{https://github.com/zhixinwang/frustum-convnet}.} |
Tasks | 3D Object Detection, Object Detection |
Published | 2019-03-05 |
URL | https://arxiv.org/abs/1903.01864v2 |
https://arxiv.org/pdf/1903.01864v2.pdf | |
PWC | https://paperswithcode.com/paper/frustum-convnet-sliding-frustums-to-aggregate |
Repo | https://github.com/zhixinwang/frustum-convnet |
Framework | pytorch |
Contrastive Variational Autoencoder Enhances Salient Features
Title | Contrastive Variational Autoencoder Enhances Salient Features |
Authors | Abubakar Abid, James Zou |
Abstract | Variational autoencoders are powerful algorithms for identifying dominant latent structure in a single dataset. In many applications, however, we are interested in modeling latent structure and variation that are enriched in a target dataset compared to some background—e.g. enriched in patients compared to the general population. Contrastive learning is a principled framework to capture such enriched variation between the target and background, but state-of-the-art contrastive methods are limited to linear models. In this paper, we introduce the contrastive variational autoencoder (cVAE), which combines the benefits of contrastive learning with the power of deep generative models. The cVAE is designed to identify and enhance salient latent features. The cVAE is trained on two related but unpaired datasets, one of which has minimal contribution from the salient latent features. The cVAE explicitly models latent features that are shared between the datasets, as well as those that are enriched in one dataset relative to the other, which allows the algorithm to isolate and enhance the salient latent features. The algorithm is straightforward to implement, has a similar run-time to the standard VAE, and is robust to noise and dataset purity. We conduct experiments across diverse types of data, including gene expression and facial images, showing that the cVAE effectively uncovers latent structure that is salient in a particular analysis. |
Tasks | |
Published | 2019-02-12 |
URL | http://arxiv.org/abs/1902.04601v1 |
http://arxiv.org/pdf/1902.04601v1.pdf | |
PWC | https://paperswithcode.com/paper/contrastive-variational-autoencoder-enhances |
Repo | https://github.com/abidlabs/contrastive_vae |
Framework | tf |
BAGS: An automatic homework grading system using the pictures taken by smart phones
Title | BAGS: An automatic homework grading system using the pictures taken by smart phones |
Authors | Xiaoshuo Li, Tiezhu Yue, Xuanping Huang, Zhe Yang, Gang Xu |
Abstract | Homework grading is critical to evaluate teaching quality and effect. However, it is usually time-consuming to grade the homework manually. In automatic homework grading scenario, many optical mark reader (OMR)-based solutions which require specific equipments have been proposed. Although many of them can achieve relatively high accuracy, they are less convenient for users. In contrast, with the popularity of smart phones, the automatic grading system which depends on the image photographed by phones becomes more available. In practice, due to different photographing angles or uneven papers, images may be distorted. Moreover, most of images are photographed under complex backgrounds, making answer areas detection more difficult. To solve these problems, we propose BAGS, an automatic homework grading system which can effectively locate and recognize handwritten answers. In BAGS, all the answers would be written above the answer area underlines (AAU), and we use two segmentation networks based on DeepLabv3+ to locate the answer areas. Then, we use the characters recognition part to recognize students’ answers. Finally, the grading part is designed for the comparison between the recognized answers and the standard ones. In our test, BAGS correctly locates and recognizes the handwritten answers in 91% of total answer areas. |
Tasks | |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.03767v1 |
https://arxiv.org/pdf/1906.03767v1.pdf | |
PWC | https://paperswithcode.com/paper/bags-an-automatic-homework-grading-system |
Repo | https://github.com/boxfish-ai/BAGS |
Framework | none |
Generating Patient-like Phantoms Using Fully Unsupervised Deformable Image Registration with Convolutional Neural Networks
Title | Generating Patient-like Phantoms Using Fully Unsupervised Deformable Image Registration with Convolutional Neural Networks |
Authors | Junyu Chen, Ye Li, Eric C. Frey |
Abstract | The use of Convolutional neural networks (ConvNets) in medical imaging research has become widespread in recent years. However, a major drawback of these methods is that they require a large number of annotated training images. Data augmentation has been proposed to alleviate this. One data augmentation strategy is to apply random deformation to existing image data, but the deformed images often will not follow exhibit realistic shape or intensity patterns. In this paper, we present a novel, ConvNet based image registration method for creating patient-like digital phantoms from the existing computerized phantoms. Unlike existing learning-based registration techniques, for which the performance predominantly depends on the domain-specific training images, the proposed method is fully unsupervised, meaning that it optimizes an objective function independently of training data for a given image pair. While classical methods registration also do not require training data, they work in lower-dimensional parameter space; the proposed approach operates directly in the high-dimensional parameter space without any training beforehand. In this paper, we show that the resulting deformed phantom competently matches the anatomy model of a real human while providing the “gold-standard” for the anatomies. Combined with simulation programs, the generated phantoms could potentially serve as a data augmentation tool in today’s deep learning studies. |
Tasks | Image Registration, Medical Image Registration |
Published | 2019-12-06 |
URL | https://arxiv.org/abs/1912.02942v2 |
https://arxiv.org/pdf/1912.02942v2.pdf | |
PWC | https://paperswithcode.com/paper/generating-patient-like-phantoms-using-fully |
Repo | https://github.com/junyuchen245/Fully_Unsupervised_CNN_Registration_Keras |
Framework | tf |
DiffQue: Estimating Relative Difficulty of Questions in Community Question Answering Services
Title | DiffQue: Estimating Relative Difficulty of Questions in Community Question Answering Services |
Authors | Deepak Thukral, Adesh Pandey, Rishabh Gupta, Vikram Goyal, Tanmoy Chakraborty |
Abstract | Automatic estimation of relative difficulty of a pair of questions is an important and challenging problem in community question answering (CQA) services. There are limited studies which addressed this problem. Past studies mostly leveraged expertise of users answering the questions and barely considered other properties of CQA services such as metadata of users and posts, temporal information and textual content. In this paper, we propose DiffQue, a novel system that maps this problem to a network-aided edge directionality prediction problem. DiffQue starts by constructing a novel network structure that captures different notions of difficulties among a pair of questions. It then measures the relative difficulty of two questions by predicting the direction of a (virtual) edge connecting these two questions in the network. It leverages features extracted from the network structure, metadata of users/posts and textual description of questions and answers. Experiments on datasets obtained from two CQA sites (further divided into four datasets) with human annotated ground-truth show that DiffQue outperforms four state-of-the-art methods by a significant margin (28.77% higher F1 score and 28.72% higher AUC than the best baseline). As opposed to the other baselines, (i) DiffQue appropriately responds to the training noise, (ii) DiffQue is capable of adapting multiple domains (CQA datasets), and (iii) DiffQue can efficiently handle ‘cold start’ problem which may arise due to the lack of information for newly posted questions or newly arrived users. |
Tasks | Community Question Answering, Question Answering |
Published | 2019-06-01 |
URL | https://arxiv.org/abs/1906.00145v1 |
https://arxiv.org/pdf/1906.00145v1.pdf | |
PWC | https://paperswithcode.com/paper/190600145 |
Repo | https://github.com/LCS2-IIITD/DiffQue-TIST |
Framework | none |
Asynchronous Batch Bayesian Optimisation with Improved Local Penalisation
Title | Asynchronous Batch Bayesian Optimisation with Improved Local Penalisation |
Authors | Ahsan S. Alvi, Binxin Ru, Jan Calliess, Stephen J. Roberts, Michael A. Osborne |
Abstract | Batch Bayesian optimisation (BO) has been successfully applied to hyperparameter tuning using parallel computing, but it is wasteful of resources: workers that complete jobs ahead of others are left idle. We address this problem by developing an approach, Penalising Locally for Asynchronous Bayesian Optimisation on $k$ workers (PLAyBOOK), for asynchronous parallel BO. We demonstrate empirically the efficacy of PLAyBOOK and its variants on synthetic tasks and a real-world problem. We undertake a comparison between synchronous and asynchronous BO, and show that asynchronous BO often outperforms synchronous batch BO in both wall-clock time and number of function evaluations. |
Tasks | Bayesian Optimisation |
Published | 2019-01-29 |
URL | https://arxiv.org/abs/1901.10452v3 |
https://arxiv.org/pdf/1901.10452v3.pdf | |
PWC | https://paperswithcode.com/paper/asynchronous-batch-bayesian-optimisation-with |
Repo | https://github.com/a5a/asynchronous-BO |
Framework | none |
Opportunistic Learning: Budgeted Cost-Sensitive Learning from Data Streams
Title | Opportunistic Learning: Budgeted Cost-Sensitive Learning from Data Streams |
Authors | Mohammad Kachuee, Orpaz Goldstein, Kimmo Karkkainen, Sajad Darabi, Majid Sarrafzadeh |
Abstract | In many real-world learning scenarios, features are only acquirable at a cost constrained under a budget. In this paper, we propose a novel approach for cost-sensitive feature acquisition at the prediction-time. The suggested method acquires features incrementally based on a context-aware feature-value function. We formulate the problem in the reinforcement learning paradigm, and introduce a reward function based on the utility of each feature. Specifically, MC dropout sampling is used to measure expected variations of the model uncertainty which is used as a feature-value function. Furthermore, we suggest sharing representations between the class predictor and value function estimator networks. The suggested approach is completely online and is readily applicable to stream learning setups. The solution is evaluated on three different datasets including the well-known MNIST dataset as a benchmark as well as two cost-sensitive datasets: Yahoo Learning to Rank and a dataset in the medical domain for diabetes classification. According to the results, the proposed method is able to efficiently acquire features and make accurate predictions. |
Tasks | Learning-To-Rank |
Published | 2019-01-02 |
URL | http://arxiv.org/abs/1901.00243v2 |
http://arxiv.org/pdf/1901.00243v2.pdf | |
PWC | https://paperswithcode.com/paper/opportunistic-learning-budgeted-cost |
Repo | https://github.com/mkachuee/Opportunistic |
Framework | pytorch |
GMAN: A Graph Multi-Attention Network for Traffic Prediction
Title | GMAN: A Graph Multi-Attention Network for Traffic Prediction |
Authors | Chuanpan Zheng, Xiaoliang Fan, Cheng Wang, Jianzhong Qi |
Abstract | Long-term traffic prediction is highly challenging due to the complexity of traffic systems and the constantly changing nature of many impacting factors. In this paper, we focus on the spatio-temporal factors, and propose a graph multi-attention network (GMAN) to predict traffic conditions for time steps ahead at different locations on a road network graph. GMAN adapts an encoder-decoder architecture, where both the encoder and the decoder consist of multiple spatio-temporal attention blocks to model the impact of the spatio-temporal factors on traffic conditions. The encoder encodes the input traffic features and the decoder predicts the output sequence. Between the encoder and the decoder, a transform attention layer is applied to convert the encoded traffic features to generate the sequence representations of future time steps as the input of the decoder. The transform attention mechanism models the direct relationships between historical and future time steps that helps to alleviate the error propagation problem among prediction time steps. Experimental results on two real-world traffic prediction tasks (i.e., traffic volume prediction and traffic speed prediction) demonstrate the superiority of GMAN. In particular, in the 1 hour ahead prediction, GMAN outperforms state-of-the-art methods by up to 4% improvement in MAE measure. The source code is available at https://github.com/zhengchuanpan/GMAN. |
Tasks | Traffic Prediction |
Published | 2019-11-11 |
URL | https://arxiv.org/abs/1911.08415v2 |
https://arxiv.org/pdf/1911.08415v2.pdf | |
PWC | https://paperswithcode.com/paper/gman-a-graph-multi-attention-network-for |
Repo | https://github.com/zhengchuanpan/GMAN |
Framework | none |
Sparse Transfer Learning via Winning Lottery Tickets
Title | Sparse Transfer Learning via Winning Lottery Tickets |
Authors | Rahul Mehta |
Abstract | The recently proposed Lottery Ticket Hypothesis of Frankle and Carbin (2019) suggests that the performance of over-parameterized deep networks is due to the random initialization seeding the network with a small fraction of favorable weights. These weights retain their dominant status throughout training – in a very real sense, this sub-network “won the lottery” during initialization. The authors find sub-networks via unstructured magnitude pruning with 85-95% of parameters removed that train to the same accuracy as the original network at a similar speed, which they call winning tickets. In this paper, we extend the Lottery Ticket Hypothesis to a variety of transfer learning tasks. We show that sparse sub-networks with approximately 90-95% of weights removed achieve (and often exceed) the accuracy of the original dense network in several realistic settings. We experimentally validate this by transferring the sparse representation found via pruning on CIFAR-10 to SmallNORB and FashionMNIST for object recognition tasks. |
Tasks | Object Recognition, Transfer Learning |
Published | 2019-05-19 |
URL | https://arxiv.org/abs/1905.07785v2 |
https://arxiv.org/pdf/1905.07785v2.pdf | |
PWC | https://paperswithcode.com/paper/sparse-transfer-learning-via-winning-lottery |
Repo | https://github.com/rahulsmehta/sparsity-experiments |
Framework | pytorch |
Hyper-Parameter Sweep on AlphaZero General
Title | Hyper-Parameter Sweep on AlphaZero General |
Authors | Hui Wang, Michael Emmerich, Mike Preuss, Aske Plaat |
Abstract | Since AlphaGo and AlphaGo Zero have achieved breakground successes in the game of Go, the programs have been generalized to solve other tasks. Subsequently, AlphaZero was developed to play Go, Chess and Shogi. In the literature, the algorithms are explained well. However, AlphaZero contains many parameters, and for neither AlphaGo, AlphaGo Zero nor AlphaZero, there is sufficient discussion about how to set parameter values in these algorithms. Therefore, in this paper, we choose 12 parameters in AlphaZero and evaluate how these parameters contribute to training. We focus on three objectives~(training loss, time cost and playing strength). For each parameter, we train 3 models using 3 different values~(minimum value, default value, maximum value). We use the game of play 6$\times$6 Othello, on the AlphaZeroGeneral open source re-implementation of AlphaZero. Overall, experimental results show that different values can lead to different training results, proving the importance of such a parameter sweep. We categorize these 12 parameters into time-sensitive parameters and time-friendly parameters. Moreover, through multi-objective analysis, this paper provides an insightful basis for further hyper-parameter optimization. |
Tasks | Game of Go |
Published | 2019-03-19 |
URL | http://arxiv.org/abs/1903.08129v1 |
http://arxiv.org/pdf/1903.08129v1.pdf | |
PWC | https://paperswithcode.com/paper/hyper-parameter-sweep-on-alphazero-general |
Repo | https://github.com/QueensGambit/CrazyAra |
Framework | mxnet |
OpenNRE: An Open and Extensible Toolkit for Neural Relation Extraction
Title | OpenNRE: An Open and Extensible Toolkit for Neural Relation Extraction |
Authors | Xu Han, Tianyu Gao, Yuan Yao, Demin Ye, Zhiyuan Liu, Maosong Sun |
Abstract | OpenNRE is an open-source and extensible toolkit that provides a unified framework to implement neural models for relation extraction (RE). Specifically, by implementing typical RE methods, OpenNRE not only allows developers to train custom models to extract structured relational facts from the plain text but also supports quick model validation for researchers. Besides, OpenNRE provides various functional RE modules based on both TensorFlow and PyTorch to maintain sufficient modularity and extensibility, making it becomes easy to incorporate new models into the framework. Besides the toolkit, we also release an online system to meet real-time extraction without any training and deploying. Meanwhile, the online system can extract facts in various scenarios as well as aligning the extracted facts to Wikidata, which may benefit various downstream knowledge-driven applications (e.g., information retrieval and question answering). More details of the toolkit and online system can be obtained from http://github.com/thunlp/OpenNRE. |
Tasks | Information Retrieval, Question Answering, Relation Extraction |
Published | 2019-09-28 |
URL | https://arxiv.org/abs/1909.13078v1 |
https://arxiv.org/pdf/1909.13078v1.pdf | |
PWC | https://paperswithcode.com/paper/opennre-an-open-and-extensible-toolkit-for |
Repo | https://github.com/thunlp/OpenNRE |
Framework | tf |