Paper Group AWR 302
Empirical Risk Minimization and Stochastic Gradient Descent for Relational Data. Legendre Decomposition for Tensors. Towards Understanding Regularization in Batch Normalization. On the iterative refinement of densely connected representation levels for semantic segmentation. Generating Realistic Geology Conditioned on Physical Measurements with Gen …
Empirical Risk Minimization and Stochastic Gradient Descent for Relational Data
Title | Empirical Risk Minimization and Stochastic Gradient Descent for Relational Data |
Authors | Victor Veitch, Morgane Austern, Wenda Zhou, David M. Blei, Peter Orbanz |
Abstract | Empirical risk minimization is the main tool for prediction problems, but its extension to relational data remains unsolved. We solve this problem using recent ideas from graph sampling theory to (i) define an empirical risk for relational data and (ii) obtain stochastic gradients for this empirical risk that are automatically unbiased. This is achieved by considering the method by which data is sampled from a graph as an explicit component of model design. By integrating fast implementations of graph sampling schemes with standard automatic differentiation tools, we provide an efficient turnkey solver for the risk minimization problem. We establish basic theoretical properties of the procedure. Finally, we demonstrate relational ERM with application to two non-standard problems: one-stage training for semi-supervised node classification, and learning embedding vectors for vertex attributes. Experiments confirm that the turnkey inference procedure is effective in practice, and that the sampling scheme used for model specification has a strong effect on model performance. Code is available at https://github.com/wooden-spoon/relational-ERM. |
Tasks | Node Classification |
Published | 2018-06-27 |
URL | http://arxiv.org/abs/1806.10701v2 |
http://arxiv.org/pdf/1806.10701v2.pdf | |
PWC | https://paperswithcode.com/paper/empirical-risk-minimization-and-stochastic |
Repo | https://github.com/wooden-spoon/relational-ERM |
Framework | tf |
Legendre Decomposition for Tensors
Title | Legendre Decomposition for Tensors |
Authors | Mahito Sugiyama, Hiroyuki Nakahara, Koji Tsuda |
Abstract | We present a novel nonnegative tensor decomposition method, called Legendre decomposition, which factorizes an input tensor into a multiplicative combination of parameters. Thanks to the well-developed theory of information geometry, the reconstructed tensor is unique and always minimizes the KL divergence from an input tensor. We empirically show that Legendre decomposition can more accurately reconstruct tensors than other nonnegative tensor decomposition methods. |
Tasks | |
Published | 2018-02-13 |
URL | http://arxiv.org/abs/1802.04502v2 |
http://arxiv.org/pdf/1802.04502v2.pdf | |
PWC | https://paperswithcode.com/paper/legendre-decomposition-for-tensors |
Repo | https://github.com/mahito-sugiyama/Legendre-decomposition |
Framework | none |
Towards Understanding Regularization in Batch Normalization
Title | Towards Understanding Regularization in Batch Normalization |
Authors | Ping Luo, Xinjiang Wang, Wenqi Shao, Zhanglin Peng |
Abstract | Batch Normalization (BN) improves both convergence and generalization in training neural networks. This work understands these phenomena theoretically. We analyze BN by using a basic block of neural networks, consisting of a kernel layer, a BN layer, and a nonlinear activation function. This basic network helps us understand the impacts of BN in three aspects. First, by viewing BN as an implicit regularizer, BN can be decomposed into population normalization (PN) and gamma decay as an explicit regularization. Second, learning dynamics of BN and the regularization show that training converged with large maximum and effective learning rate. Third, generalization of BN is explored by using statistical mechanics. Experiments demonstrate that BN in convolutional neural networks share the same traits of regularization as the above analyses. |
Tasks | |
Published | 2018-09-04 |
URL | http://arxiv.org/abs/1809.00846v4 |
http://arxiv.org/pdf/1809.00846v4.pdf | |
PWC | https://paperswithcode.com/paper/towards-understanding-regularization-in-batch |
Repo | https://github.com/darshansiddu01/CNN |
Framework | pytorch |
On the iterative refinement of densely connected representation levels for semantic segmentation
Title | On the iterative refinement of densely connected representation levels for semantic segmentation |
Authors | Arantxa Casanova, Guillem Cucurull, Michal Drozdzal, Adriana Romero, Yoshua Bengio |
Abstract | State-of-the-art semantic segmentation approaches increase the receptive field of their models by using either a downsampling path composed of poolings/strided convolutions or successive dilated convolutions. However, it is not clear which operation leads to best results. In this paper, we systematically study the differences introduced by distinct receptive field enlargement methods and their impact on the performance of a novel architecture, called Fully Convolutional DenseResNet (FC-DRN). FC-DRN has a densely connected backbone composed of residual networks. Following standard image segmentation architectures, receptive field enlargement operations that change the representation level are interleaved among residual networks. This allows the model to exploit the benefits of both residual and dense connectivity patterns, namely: gradient flow, iterative refinement of representations, multi-scale feature combination and deep supervision. In order to highlight the potential of our model, we test it on the challenging CamVid urban scene understanding benchmark and make the following observations: 1) downsampling operations outperform dilations when the model is trained from scratch, 2) dilations are useful during the finetuning step of the model, 3) coarser representations require less refinement steps, and 4) ResNets (by model construction) are good regularizers, since they can reduce the model capacity when needed. Finally, we compare our architecture to alternative methods and report state-of-the-art result on the Camvid dataset, with at least twice fewer parameters. |
Tasks | Scene Understanding, Semantic Segmentation |
Published | 2018-04-30 |
URL | http://arxiv.org/abs/1804.11332v1 |
http://arxiv.org/pdf/1804.11332v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-iterative-refinement-of-densely |
Repo | https://github.com/ArantxaCasanova/fc-drn |
Framework | pytorch |
Generating Realistic Geology Conditioned on Physical Measurements with Generative Adversarial Networks
Title | Generating Realistic Geology Conditioned on Physical Measurements with Generative Adversarial Networks |
Authors | Emilien Dupont, Tuanfeng Zhang, Peter Tilke, Lin Liang, William Bailey |
Abstract | An important problem in geostatistics is to build models of the subsurface of the Earth given physical measurements at sparse spatial locations. Typically, this is done using spatial interpolation methods or by reproducing patterns from a reference image. However, these algorithms fail to produce realistic patterns and do not exhibit the wide range of uncertainty inherent in the prediction of geology. In this paper, we show how semantic inpainting with Generative Adversarial Networks can be used to generate varied realizations of geology which honor physical measurements while matching the expected geological patterns. In contrast to other algorithms, our method scales well with the number of data points and mimics a distribution of patterns as opposed to a single pattern or image. The generated conditional samples are state of the art. |
Tasks | |
Published | 2018-02-08 |
URL | http://arxiv.org/abs/1802.03065v3 |
http://arxiv.org/pdf/1802.03065v3.pdf | |
PWC | https://paperswithcode.com/paper/generating-realistic-geology-conditioned-on |
Repo | https://github.com/amoodie/StratGAN |
Framework | tf |
MNIST Dataset Classification Utilizing k-NN Classifier with Modified Sliding-window Metric
Title | MNIST Dataset Classification Utilizing k-NN Classifier with Modified Sliding-window Metric |
Authors | Divas Grover, Behrad Toghi |
Abstract | The MNIST dataset of the handwritten digits is known as one of the commonly used datasets for machine learning and computer vision research. We aim to study a widely applicable classification problem and apply a simple yet efficient K-nearest neighbor classifier with an enhanced heuristic. We evaluate the performance of the K-nearest neighbor classification algorithm on the MNIST dataset where the $L2$ Euclidean distance metric is compared to a modified distance metric which utilizes the sliding window technique in order to avoid performance degradation due to slight spatial misalignments. The accuracy metric and confusion matrices are used as the performance indicators to compare the performance of the baseline algorithm versus the enhanced sliding window method and results show significant improvement using this proposed method. |
Tasks | |
Published | 2018-09-18 |
URL | http://arxiv.org/abs/1809.06846v4 |
http://arxiv.org/pdf/1809.06846v4.pdf | |
PWC | https://paperswithcode.com/paper/mnist-dataset-classification-utilizing-k-nn |
Repo | https://github.com/BehradToghi/kNN_SWin |
Framework | none |
FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation
Title | FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation |
Authors | Xu Han, Hao Zhu, Pengfei Yu, Ziyun Wang, Yuan Yao, Zhiyuan Liu, Maosong Sun |
Abstract | We present a Few-Shot Relation Classification Dataset (FewRel), consisting of 70, 000 sentences on 100 relations derived from Wikipedia and annotated by crowdworkers. The relation of each sentence is first recognized by distant supervision methods, and then filtered by crowdworkers. We adapt the most recent state-of-the-art few-shot learning methods for relation classification and conduct a thorough evaluation of these methods. Empirical results show that even the most competitive few-shot learning models struggle on this task, especially as compared with humans. We also show that a range of different reasoning skills are needed to solve our task. These results indicate that few-shot relation classification remains an open problem and still requires further research. Our detailed analysis points multiple directions for future research. All details and resources about the dataset and baselines are released on http://zhuhao.me/fewrel. |
Tasks | Few-Shot Learning, Few-Shot Relation Classification, Relation Classification, Relation Extraction |
Published | 2018-10-24 |
URL | http://arxiv.org/abs/1810.10147v2 |
http://arxiv.org/pdf/1810.10147v2.pdf | |
PWC | https://paperswithcode.com/paper/fewrel-a-large-scale-supervised-few-shot |
Repo | https://github.com/ProKil/FewRel |
Framework | pytorch |
Non-local U-Net for Biomedical Image Segmentation
Title | Non-local U-Net for Biomedical Image Segmentation |
Authors | Zhengyang Wang, Na Zou, Dinggang Shen, Shuiwang Ji |
Abstract | Deep learning has shown its great promise in various biomedical image segmentation tasks. Existing models are typically based on U-Net and rely on an encoder-decoder architecture with stacked local operators to aggregate long-range information gradually. However, only using the local operators limits the efficiency and effectiveness. In this work, we propose the non-local U-Nets, which are equipped with flexible global aggregation blocks, for biomedical image segmentation. These blocks can be inserted into U-Net as size-preserving processes, as well as down-sampling and up-sampling layers. We perform thorough experiments on the 3D multimodality isointense infant brain MR image segmentation task to evaluate the non-local U-Nets. Results show that our proposed models achieve top performances with fewer parameters and faster computation. |
Tasks | Brain Image Segmentation, Semantic Segmentation |
Published | 2018-12-10 |
URL | https://arxiv.org/abs/1812.04103v2 |
https://arxiv.org/pdf/1812.04103v2.pdf | |
PWC | https://paperswithcode.com/paper/global-deep-learning-methods-for |
Repo | https://github.com/zhengyang-wang/3D-Unet--Tensorflow |
Framework | tf |
Automatic Skin Lesion Segmentation Using Deep Fully Convolutional Networks
Title | Automatic Skin Lesion Segmentation Using Deep Fully Convolutional Networks |
Authors | Hongming Xu, Tae Hyun Hwang |
Abstract | This paper summarizes our method and validation results for the ISIC Challenge 2018 - Skin Lesion Analysis Towards Melanoma Detection - Task 1: Lesion Segmentation |
Tasks | Lesion Segmentation |
Published | 2018-07-17 |
URL | http://arxiv.org/abs/1807.06466v1 |
http://arxiv.org/pdf/1807.06466v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-skin-lesion-segmentation-using-deep |
Repo | https://github.com/RegulusReggie/CS259 |
Framework | none |
Deep $k$-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions
Title | Deep $k$-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions |
Authors | Junru Wu, Yue Wang, Zhenyu Wu, Zhangyang Wang, Ashok Veeraraghavan, Yingyan Lin |
Abstract | The current trend of pushing CNNs deeper with convolutions has created a pressing demand to achieve higher compression gains on CNNs where convolutions dominate the computation and parameter amount (e.g., GoogLeNet, ResNet and Wide ResNet). Further, the high energy consumption of convolutions limits its deployment on mobile devices. To this end, we proposed a simple yet effective scheme for compressing convolutions though applying k-means clustering on the weights, compression is achieved through weight-sharing, by only recording $K$ cluster centers and weight assignment indexes. We then introduced a novel spectrally relaxed $k$-means regularization, which tends to make hard assignments of convolutional layer weights to $K$ learned cluster centers during re-training. We additionally propose an improved set of metrics to estimate energy consumption of CNN hardware implementations, whose estimation results are verified to be consistent with previously proposed energy estimation tool extrapolated from actual hardware measurements. We finally evaluated Deep $k$-Means across several CNN models in terms of both compression ratio and energy consumption reduction, observing promising results without incurring accuracy loss. The code is available at https://github.com/Sandbox3aster/Deep-K-Means |
Tasks | |
Published | 2018-06-24 |
URL | http://arxiv.org/abs/1806.09228v1 |
http://arxiv.org/pdf/1806.09228v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-k-means-re-training-and-parameter-1 |
Repo | https://github.com/Sandbox3aster/Deep-K-Means |
Framework | pytorch |
QDEE: Question Difficulty and Expertise Estimation in Community Question Answering Sites
Title | QDEE: Question Difficulty and Expertise Estimation in Community Question Answering Sites |
Authors | Jiankai Sun, Sobhan Moosavi, Rajiv Ramnath, Srinivasan Parthasarathy |
Abstract | In this paper, we present a framework for Question Difficulty and Expertise Estimation (QDEE) in Community Question Answering sites (CQAs) such as Yahoo! Answers and Stack Overflow, which tackles a fundamental challenge in crowdsourcing: how to appropriately route and assign questions to users with the suitable expertise. This problem domain has been the subject of much research and includes both language-agnostic as well as language conscious solutions. We bring to bear a key language-agnostic insight: that users gain expertise and therefore tend to ask as well as answer more difficult questions over time. We use this insight within the popular competition (directed) graph model to estimate question difficulty and user expertise by identifying key hierarchical structure within said model. An important and novel contribution here is the application of “social agony” to this problem domain. Difficulty levels of newly posted questions (the cold-start problem) are estimated by using our QDEE framework and additional textual features. We also propose a model to route newly posted questions to appropriate users based on the difficulty level of the question and the expertise of the user. Extensive experiments on real world CQAs such as Yahoo! Answers and Stack Overflow data demonstrate the improved efficacy of our approach over contemporary state-of-the-art models. The QDEE framework also allows us to characterize user expertise in novel ways by identifying interesting patterns and roles played by different users in such CQAs. |
Tasks | Community Question Answering, Question Answering |
Published | 2018-03-31 |
URL | http://arxiv.org/abs/1804.00109v2 |
http://arxiv.org/pdf/1804.00109v2.pdf | |
PWC | https://paperswithcode.com/paper/qdee-question-difficulty-and-expertise |
Repo | https://github.com/zhenv5/QDEE |
Framework | none |
DropFilter: Dropout for Convolutions
Title | DropFilter: Dropout for Convolutions |
Authors | Zhengsu Chen Jianwei Niu Qi Tian |
Abstract | Using a large number of parameters , deep neural networks have achieved remarkable performance on computer vison and natural language processing tasks. However the networks usually suffer from overfitting by using too much parameters. Dropout is a widely use method to deal with overfitting. Although dropout can significantly regularize densely connected layers in neural networks, it leads to suboptimal results when using for convolutional layers. To track this problem, we propose DropFilter, a new dropout method for convolutional layers. DropFilter randomly suppresses the outputs of some filters. Because it is observed that co-adaptions are more likely to occurs inter filters rather than intra filters in convolutional layers. Using DropFilter, we remarkably improve the performance of convolutional networks on CIFAR and ImageNet. |
Tasks | |
Published | 2018-10-23 |
URL | http://arxiv.org/abs/1810.09849v1 |
http://arxiv.org/pdf/1810.09849v1.pdf | |
PWC | https://paperswithcode.com/paper/dropfilter-dropout-for-convolutions |
Repo | https://github.com/harirajeev/Reference |
Framework | pytorch |
Adversarial Defense of Image Classification Using a Variational Auto-Encoder
Title | Adversarial Defense of Image Classification Using a Variational Auto-Encoder |
Authors | Yi Luo, Henry Pfister |
Abstract | Deep neural networks are known to be vulnerable to adversarial attacks. This exposes them to potential exploits in security-sensitive applications and highlights their lack of robustness. This paper uses a variational auto-encoder (VAE) to defend against adversarial attacks for image classification tasks. This VAE defense has a few nice properties: (1) it is quite flexible and its use of randomness makes it harder to attack; (2) it can learn disentangled representations that prevent blurry reconstruction; and (3) a patch-wise VAE defense strategy is used that does not require retraining for different size images. For moderate to severe attacks, this system outperforms or closely matches the performance of JPEG compression, with the best quality parameter. It also has more flexibility and potential for improvement via training. |
Tasks | Adversarial Defense, Image Classification |
Published | 2018-12-07 |
URL | http://arxiv.org/abs/1812.02891v1 |
http://arxiv.org/pdf/1812.02891v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-defense-of-image-classification |
Repo | https://github.com/Roy-YL/VAE-Adversarial-Defense |
Framework | tf |
Masking: A New Perspective of Noisy Supervision
Title | Masking: A New Perspective of Noisy Supervision |
Authors | Bo Han, Jiangchao Yao, Gang Niu, Mingyuan Zhou, Ivor Tsang, Ya Zhang, Masashi Sugiyama |
Abstract | It is important to learn various types of classifiers given training data with noisy labels. Noisy labels, in the most popular noise model hitherto, are corrupted from ground-truth labels by an unknown noise transition matrix. Thus, by estimating this matrix, classifiers can escape from overfitting those noisy labels. However, such estimation is practically difficult, due to either the indirect nature of two-step approaches, or not big enough data to afford end-to-end approaches. In this paper, we propose a human-assisted approach called Masking that conveys human cognition of invalid class transitions and naturally speculates the structure of the noise transition matrix. To this end, we derive a structure-aware probabilistic model incorporating a structure prior, and solve the challenges from structure extraction and structure alignment. Thanks to Masking, we only estimate unmasked noise transition probabilities and the burden of estimation is tremendously reduced. We conduct extensive experiments on CIFAR-10 and CIFAR-100 with three noise structures as well as the industrial-level Clothing1M with agnostic noise structure, and the results show that Masking can improve the robustness of classifiers significantly. |
Tasks | Image Classification |
Published | 2018-05-21 |
URL | http://arxiv.org/abs/1805.08193v2 |
http://arxiv.org/pdf/1805.08193v2.pdf | |
PWC | https://paperswithcode.com/paper/masking-a-new-perspective-of-noisy |
Repo | https://github.com/bhanML/Co-teaching |
Framework | pytorch |
Incorporating Literals into Knowledge Graph Embeddings
Title | Incorporating Literals into Knowledge Graph Embeddings |
Authors | Agustinus Kristiadi, Mohammad Asif Khan, Denis Lukovnikov, Jens Lehmann, Asja Fischer |
Abstract | Knowledge graphs, on top of entities and their relationships, contain other important elements: literals. Literals encode interesting properties (e.g. the height) of entities that are not captured by links between entities alone. Most of the existing work on embedding (or latent feature) based knowledge graph analysis focuses mainly on the relations between entities. In this work, we study the effect of incorporating literal information into existing link prediction methods. Our approach, which we name LiteralE, is an extension that can be plugged into existing latent feature methods. LiteralE merges entity embeddings with their literal information using a learnable, parametrized function, such as a simple linear or nonlinear transformation, or a multilayer neural network. We extend several popular embedding models based on LiteralE and evaluate their performance on the task of link prediction. Despite its simplicity, LiteralE proves to be an effective way to incorporate literal information into existing embedding based methods, improving their performance on different standard datasets, which we augmented with their literals and provide as testbed for further research. |
Tasks | Entity Embeddings, Knowledge Graph Embeddings, Knowledge Graphs, Link Prediction |
Published | 2018-02-03 |
URL | https://arxiv.org/abs/1802.00934v3 |
https://arxiv.org/pdf/1802.00934v3.pdf | |
PWC | https://paperswithcode.com/paper/incorporating-literals-into-knowledge-graph |
Repo | https://github.com/SmartDataAnalytics/LiteralE |
Framework | pytorch |