Paper Group AWR 287
Visual Referring Expression Recognition: What Do Systems Actually Learn?. Correspondence of Deep Neural Networks and the Brain for Visual Textures. Short-term Load Forecasting with Deep Residual Networks. Attention-based Deep Multiple Instance Learning. Stroke Controllable Fast Style Transfer with Adaptive Receptive Fields. Sliced-Wasserstein Flows …
Visual Referring Expression Recognition: What Do Systems Actually Learn?
Title | Visual Referring Expression Recognition: What Do Systems Actually Learn? |
Authors | Volkan Cirik, Louis-Philippe Morency, Taylor Berg-Kirkpatrick |
Abstract | We present an empirical analysis of the state-of-the-art systems for referring expression recognition – the task of identifying the object in an image referred to by a natural language expression – with the goal of gaining insight into how these systems reason about language and vision. Surprisingly, we find strong evidence that even sophisticated and linguistically-motivated models for this task may ignore the linguistic structure, instead relying on shallow correlations introduced by unintended biases in the data selection and annotation process. For example, we show that a system trained and tested on the input image $\textit{without the input referring expression}$ can achieve a precision of 71.2% in top-2 predictions. Furthermore, a system that predicts only the object category given the input can achieve a precision of 84.2% in top-2 predictions. These surprisingly positive results for what should be deficient prediction scenarios suggest that careful analysis of what our models are learning – and further, how our data is constructed – is critical as we seek to make substantive progress on grounded language tasks. |
Tasks | |
Published | 2018-05-30 |
URL | http://arxiv.org/abs/1805.11818v1 |
http://arxiv.org/pdf/1805.11818v1.pdf | |
PWC | https://paperswithcode.com/paper/visual-referring-expression-recognition-what |
Repo | https://github.com/volkancirik/neural-sieves-refexp |
Framework | pytorch |
Correspondence of Deep Neural Networks and the Brain for Visual Textures
Title | Correspondence of Deep Neural Networks and the Brain for Visual Textures |
Authors | Md Nasir Uddin Laskar, Luis G Sanchez Giraldo, Odelia Schwartz |
Abstract | Deep convolutional neural networks (CNNs) trained on objects and scenes have shown intriguing ability to predict some response properties of visual cortical neurons. However, the factors and computations that give rise to such ability, and the role of intermediate processing stages in explaining changes that develop across areas of the cortical hierarchy, are poorly understood. We focused on the sensitivity to textures as a paradigmatic example, since recent neurophysiology experiments provide rich data pointing to texture sensitivity in secondary but not primary visual cortex. We developed a quantitative approach for selecting a subset of the neural unit population from the CNN that best describes the brain neural recordings. We found that the first two layers of the CNN showed qualitative and quantitative correspondence to the cortical data across a number of metrics. This compatibility was reduced for the architecture alone rather than the learned weights, for some other related hierarchical models, and only mildly in the absence of a nonlinear computation akin to local divisive normalization. Our results show that the CNN class of model is effective for capturing changes that develop across early areas of cortex, and has the potential to facilitate understanding of the computations that give rise to hierarchical processing in the brain. |
Tasks | |
Published | 2018-06-07 |
URL | http://arxiv.org/abs/1806.02888v1 |
http://arxiv.org/pdf/1806.02888v1.pdf | |
PWC | https://paperswithcode.com/paper/correspondence-of-deep-neural-networks-and |
Repo | https://github.com/nasirml/DeepNetAndBrain |
Framework | none |
Short-term Load Forecasting with Deep Residual Networks
Title | Short-term Load Forecasting with Deep Residual Networks |
Authors | Kunjin Chen, Kunlong Chen, Qin Wang, Ziyu He, Jun Hu, Jinliang He |
Abstract | We present in this paper a model for forecasting short-term power loads based on deep residual networks. The proposed model is able to integrate domain knowledge and researchers’ understanding of the task by virtue of different neural network building blocks. Specifically, a modified deep residual network is formulated to improve the forecast results. Further, a two-stage ensemble strategy is used to enhance the generalization capability of the proposed model. We also apply the proposed model to probabilistic load forecasting using Monte Carlo dropout. Three public datasets are used to prove the effectiveness of the proposed model. Multiple test cases and comparison with existing models show that the proposed model is able to provide accurate load forecasting results and has high generalization capability. |
Tasks | Load Forecasting |
Published | 2018-05-30 |
URL | http://arxiv.org/abs/1805.11956v1 |
http://arxiv.org/pdf/1805.11956v1.pdf | |
PWC | https://paperswithcode.com/paper/short-term-load-forecasting-with-deep |
Repo | https://github.com/yalickj/load-forecasting-resnet |
Framework | none |
Attention-based Deep Multiple Instance Learning
Title | Attention-based Deep Multiple Instance Learning |
Authors | Maximilian Ilse, Jakub M. Tomczak, Max Welling |
Abstract | Multiple instance learning (MIL) is a variation of supervised learning where a single class label is assigned to a bag of instances. In this paper, we state the MIL problem as learning the Bernoulli distribution of the bag label where the bag label probability is fully parameterized by neural networks. Furthermore, we propose a neural network-based permutation-invariant aggregation operator that corresponds to the attention mechanism. Notably, an application of the proposed attention-based operator provides insight into the contribution of each instance to the bag label. We show empirically that our approach achieves comparable performance to the best MIL methods on benchmark MIL datasets and it outperforms other methods on a MNIST-based MIL dataset and two real-life histopathology datasets without sacrificing interpretability. |
Tasks | Multiple Instance Learning |
Published | 2018-02-13 |
URL | http://arxiv.org/abs/1802.04712v4 |
http://arxiv.org/pdf/1802.04712v4.pdf | |
PWC | https://paperswithcode.com/paper/attention-based-deep-multiple-instance |
Repo | https://github.com/mv-lab/youtube8m-19 |
Framework | tf |
Stroke Controllable Fast Style Transfer with Adaptive Receptive Fields
Title | Stroke Controllable Fast Style Transfer with Adaptive Receptive Fields |
Authors | Yongcheng Jing, Yang Liu, Yezhou Yang, Zunlei Feng, Yizhou Yu, Dacheng Tao, Mingli Song |
Abstract | The Fast Style Transfer methods have been recently proposed to transfer a photograph to an artistic style in real-time. This task involves controlling the stroke size in the stylized results, which remains an open challenge. In this paper, we present a stroke controllable style transfer network that can achieve continuous and spatial stroke size control. By analyzing the factors that influence the stroke size, we propose to explicitly account for the receptive field and the style image scales. We propose a StrokePyramid module to endow the network with adaptive receptive fields, and two training strategies to achieve faster convergence and augment new stroke sizes upon a trained model respectively. By combining the proposed runtime control strategies, our network can achieve continuous changes in stroke sizes and produce distinct stroke sizes in different spatial regions within the same output image. |
Tasks | Style Transfer |
Published | 2018-02-20 |
URL | http://arxiv.org/abs/1802.07101v4 |
http://arxiv.org/pdf/1802.07101v4.pdf | |
PWC | https://paperswithcode.com/paper/stroke-controllable-fast-style-transfer-with |
Repo | https://github.com/LouieYang/stroke-controllable-fast-style-transfer |
Framework | tf |
Sliced-Wasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and Diffusions
Title | Sliced-Wasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and Diffusions |
Authors | Antoine Liutkus, Umut Şimşekli, Szymon Majewski, Alain Durmus, Fabian-Robert Stöter |
Abstract | By building upon the recent theory that established the connection between implicit generative modeling (IGM) and optimal transport, in this study, we propose a novel parameter-free algorithm for learning the underlying distributions of complicated datasets and sampling from them. The proposed algorithm is based on a functional optimization problem, which aims at finding a measure that is close to the data distribution as much as possible and also expressive enough for generative modeling purposes. We formulate the problem as a gradient flow in the space of probability measures. The connections between gradient flows and stochastic differential equations let us develop a computationally efficient algorithm for solving the optimization problem. We provide formal theoretical analysis where we prove finite-time error guarantees for the proposed algorithm. To the best of our knowledge, the proposed algorithm is the first nonparametric IGM algorithm with explicit theoretical guarantees. Our experimental results support our theory and show that our algorithm is able to successfully capture the structure of different types of data distributions. |
Tasks | |
Published | 2018-06-21 |
URL | https://arxiv.org/abs/1806.08141v2 |
https://arxiv.org/pdf/1806.08141v2.pdf | |
PWC | https://paperswithcode.com/paper/sliced-wasserstein-flows-nonparametric |
Repo | https://github.com/aliutkus/swf |
Framework | pytorch |
Face-MagNet: Magnifying Feature Maps to Detect Small Faces
Title | Face-MagNet: Magnifying Feature Maps to Detect Small Faces |
Authors | Pouya Samangouei, Mahyar Najibi, Larry Davis, Rama Chellappa |
Abstract | In this paper, we introduce the Face Magnifier Network (Face-MageNet), a face detector based on the Faster-RCNN framework which enables the flow of discriminative information of small scale faces to the classifier without any skip or residual connections. To achieve this, Face-MagNet deploys a set of ConvTranspose, also known as deconvolution, layers in the Region Proposal Network (RPN) and another set before the Region of Interest (RoI) pooling layer to facilitate detection of finer faces. In addition, we also design, train, and evaluate three other well-tuned architectures that represent the conventional solutions to the scale problem: context pooling, skip connections, and scale partitioning. Each of these three networks achieves comparable results to the state-of-the-art face detectors. With extensive experiments, we show that Face-MagNet based on a VGG16 architecture achieves better results than the recently proposed ResNet101-based HR method on the task of face detection on WIDER dataset and also achieves similar results on the hard set as our other method SSH. |
Tasks | Face Detection |
Published | 2018-03-14 |
URL | http://arxiv.org/abs/1803.05258v1 |
http://arxiv.org/pdf/1803.05258v1.pdf | |
PWC | https://paperswithcode.com/paper/face-magnet-magnifying-feature-maps-to-detect |
Repo | https://github.com/po0ya/face-magnet |
Framework | none |
Locating Objects Without Bounding Boxes
Title | Locating Objects Without Bounding Boxes |
Authors | Javier Ribera, David Güera, Yuhao Chen, Edward J. Delp |
Abstract | Recent advances in convolutional neural networks (CNN) have achieved remarkable results in locating objects in images. In these networks, the training procedure usually requires providing bounding boxes or the maximum number of expected objects. In this paper, we address the task of estimating object locations without annotated bounding boxes which are typically hand-drawn and time consuming to label. We propose a loss function that can be used in any fully convolutional network (FCN) to estimate object locations. This loss function is a modification of the average Hausdorff distance between two unordered sets of points. The proposed method has no notion of bounding boxes, region proposals, or sliding windows. We evaluate our method with three datasets designed to locate people’s heads, pupil centers and plant centers. We outperform state-of-the-art generic object detectors and methods fine-tuned for pupil tracking. |
Tasks | Object Localization |
Published | 2018-06-20 |
URL | http://arxiv.org/abs/1806.07564v2 |
http://arxiv.org/pdf/1806.07564v2.pdf | |
PWC | https://paperswithcode.com/paper/weighted-hausdorff-distance-a-loss-function |
Repo | https://github.com/N0vel/weighted-hausdorff-distance-tensorflow-keras-loss |
Framework | tf |
Additive Margin Softmax for Face Verification
Title | Additive Margin Softmax for Face Verification |
Authors | Feng Wang, Weiyang Liu, Haijun Liu, Jian Cheng |
Abstract | In this paper, we propose a conceptually simple and geometrically interpretable objective function, i.e. additive margin Softmax (AM-Softmax), for deep face verification. In general, the face verification task can be viewed as a metric learning problem, so learning large-margin face features whose intra-class variation is small and inter-class difference is large is of great importance in order to achieve good performance. Recently, Large-margin Softmax and Angular Softmax have been proposed to incorporate the angular margin in a multiplicative manner. In this work, we introduce a novel additive angular margin for the Softmax loss, which is intuitively appealing and more interpretable than the existing works. We also emphasize and discuss the importance of feature normalization in the paper. Most importantly, our experiments on LFW BLUFR and MegaFace show that our additive margin softmax loss consistently performs better than the current state-of-the-art methods using the same network architecture and training dataset. Our code has also been made available at https://github.com/happynear/AMSoftmax |
Tasks | Face Verification, Metric Learning |
Published | 2018-01-17 |
URL | http://arxiv.org/abs/1801.05599v4 |
http://arxiv.org/pdf/1801.05599v4.pdf | |
PWC | https://paperswithcode.com/paper/additive-margin-softmax-for-face-verification |
Repo | https://github.com/happynear/AMSoftmax |
Framework | tf |
Monte Carlo Dependency Estimation
Title | Monte Carlo Dependency Estimation |
Authors | Edouard Fouché, Klemens Böhm |
Abstract | Estimating the dependency of variables is a fundamental task in data analysis. Identifying the relevant attributes in databases leads to better data understanding and also improves the performance of learning algorithms, both in terms of runtime and quality. In data streams, dependency monitoring provides key insights into the underlying process, but is challenging. In this paper, we propose Monte Carlo Dependency Estimation (MCDE), a theoretical framework to estimate multivariate dependency in static and dynamic data. MCDE quantifies dependency as the average discrepancy between marginal and conditional distributions via Monte Carlo simulations. Based on this framework, we present Mann-Whitney P (MWP), a novel dependency estimator. We show that MWP satisfies a number of desirable properties and can accommodate any kind of numerical data. We demonstrate the superiority of our estimator by comparing it to the state-of-the-art multivariate dependency measures. |
Tasks | |
Published | 2018-10-04 |
URL | http://arxiv.org/abs/1810.02112v1 |
http://arxiv.org/pdf/1810.02112v1.pdf | |
PWC | https://paperswithcode.com/paper/monte-carlo-dependency-estimation |
Repo | https://github.com/edouardfouche/mcde |
Framework | none |
Improving Distantly Supervised Relation Extraction using Word and Entity Based Attention
Title | Improving Distantly Supervised Relation Extraction using Word and Entity Based Attention |
Authors | Sharmistha Jat, Siddhesh Khandelwal, Partha Talukdar |
Abstract | Relation extraction is the problem of classifying the relationship between two entities in a given sentence. Distant Supervision (DS) is a popular technique for developing relation extractors starting with limited supervision. We note that most of the sentences in the distant supervision relation extraction setting are very long and may benefit from word attention for better sentence representation. Our contributions in this paper are threefold. Firstly, we propose two novel word attention models for distantly- supervised relation extraction: (1) a Bi-directional Gated Recurrent Unit (Bi-GRU) based word attention model (BGWA), (2) an entity-centric attention model (EA), and (3) a combination model which combines multiple complementary models using weighted voting method for improved relation extraction. Secondly, we introduce GDS, a new distant supervision dataset for relation extraction. GDS removes test data noise present in all previous distant- supervision benchmark datasets, making credible automatic evaluation possible. Thirdly, through extensive experiments on multiple real-world datasets, we demonstrate the effectiveness of the proposed methods. |
Tasks | Relation Extraction, Relationship Extraction (Distant Supervised) |
Published | 2018-04-19 |
URL | http://arxiv.org/abs/1804.06987v1 |
http://arxiv.org/pdf/1804.06987v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-distantly-supervised-relation-1 |
Repo | https://github.com/JiyangZhang/learning-to-reweight-in-relation-extraion |
Framework | pytorch |
Dual Ask-Answer Network for Machine Reading Comprehension
Title | Dual Ask-Answer Network for Machine Reading Comprehension |
Authors | Han Xiao, Feng Wang, Jianfeng Yan, Jingyao Zheng |
Abstract | There are three modalities in the reading comprehension setting: question, answer and context. The task of question answering or question generation aims to infer an answer or a question when given the counterpart based on context. We present a novel two-way neural sequence transduction model that connects three modalities, allowing it to learn two tasks simultaneously and mutually benefit one another. During training, the model receives question-context-answer triplets as input and captures the cross-modal interaction via a hierarchical attention process. Unlike previous joint learning paradigms that leverage the duality of question generation and question answering at data level, we solve such dual tasks at the architecture level by mirroring the network structure and partially sharing components at different layers. This enables the knowledge to be transferred from one task to another, helping the model to find a general representation for each modality. The evaluation on four public datasets shows that our dual-learning model outperforms the mono-learning counterpart as well as the state-of-the-art joint models on both question answering and question generation tasks. |
Tasks | Machine Reading Comprehension, Question Answering, Question Generation, Reading Comprehension |
Published | 2018-09-06 |
URL | http://arxiv.org/abs/1809.01997v2 |
http://arxiv.org/pdf/1809.01997v2.pdf | |
PWC | https://paperswithcode.com/paper/dual-ask-answer-network-for-machine-reading |
Repo | https://github.com/hanxiao/daanet |
Framework | tf |
Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors
Title | Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors |
Authors | Dmitry Baranchuk, Artem Babenko, Yury Malkov |
Abstract | This work addresses the problem of billion-scale nearest neighbor search. The state-of-the-art retrieval systems for billion-scale databases are currently based on the inverted multi-index, the recently proposed generalization of the inverted index structure. The multi-index provides a very fine-grained partition of the feature space that allows extracting concise and accurate short-lists of candidates for the search queries. In this paper, we argue that the potential of the simple inverted index was not fully exploited in previous works and advocate its usage both for the highly-entangled deep descriptors and relatively disentangled SIFT descriptors. We introduce a new retrieval system that is based on the inverted index and outperforms the multi-index by a large margin for the same memory consumption and construction complexity. For example, our system achieves the state-of-the-art recall rates several times faster on the dataset of one billion deep descriptors compared to the efficient implementation of the inverted multi-index from the FAISS library. |
Tasks | |
Published | 2018-02-07 |
URL | http://arxiv.org/abs/1802.02422v2 |
http://arxiv.org/pdf/1802.02422v2.pdf | |
PWC | https://paperswithcode.com/paper/revisiting-the-inverted-indices-for-billion |
Repo | https://github.com/nmslib/hnsw |
Framework | mxnet |
A New Ensemble Learning Framework for 3D Biomedical Image Segmentation
Title | A New Ensemble Learning Framework for 3D Biomedical Image Segmentation |
Authors | Hao Zheng, Yizhe Zhang, Lin Yang, Peixian Liang, Zhuo Zhao, Chaoli Wang, Danny Z. Chen |
Abstract | 3D image segmentation plays an important role in biomedical image analysis. Many 2D and 3D deep learning models have achieved state-of-the-art segmentation performance on 3D biomedical image datasets. Yet, 2D and 3D models have their own strengths and weaknesses, and by unifying them together, one may be able to achieve more accurate results. In this paper, we propose a new ensemble learning framework for 3D biomedical image segmentation that combines the merits of 2D and 3D models. First, we develop a fully convolutional network based meta-learner to learn how to improve the results from 2D and 3D models (base-learners). Then, to minimize over-fitting for our sophisticated meta-learner, we devise a new training method that uses the results of the base-learners as multiple versions of “ground truths”. Furthermore, since our new meta-learner training scheme does not depend on manual annotation, it can utilize abundant unlabeled 3D image data to further improve the model. Extensive experiments on two public datasets (the HVSMR 2016 Challenge dataset and the mouse piriform cortex dataset) show that our approach is effective under fully-supervised, semi-supervised, and transductive settings, and attains superior performance over state-of-the-art image segmentation methods. |
Tasks | 3D Medical Imaging Segmentation, Semantic Segmentation |
Published | 2018-12-10 |
URL | http://arxiv.org/abs/1812.03945v1 |
http://arxiv.org/pdf/1812.03945v1.pdf | |
PWC | https://paperswithcode.com/paper/a-new-ensemble-learning-framework-for-3d |
Repo | https://github.com/HaoZheng94/Ensemble |
Framework | none |
Comparative Studies of Detecting Abusive Language on Twitter
Title | Comparative Studies of Detecting Abusive Language on Twitter |
Authors | Younghun Lee, Seunghyun Yoon, Kyomin Jung |
Abstract | The context-dependent nature of online aggression makes annotating large collections of data extremely difficult. Previously studied datasets in abusive language detection have been insufficient in size to efficiently train deep learning models. Recently, Hate and Abusive Speech on Twitter, a dataset much greater in size and reliability, has been released. However, this dataset has not been comprehensively studied to its potential. In this paper, we conduct the first comparative study of various learning models on Hate and Abusive Speech on Twitter, and discuss the possibility of using additional features and context data for improvements. Experimental results show that bidirectional GRU networks trained on word-level features, with Latent Topic Clustering modules, is the most accurate model scoring 0.805 F1. |
Tasks | Hate Speech Detection, Sentiment Analysis, Twitter Sentiment Analysis |
Published | 2018-08-30 |
URL | http://arxiv.org/abs/1808.10245v1 |
http://arxiv.org/pdf/1808.10245v1.pdf | |
PWC | https://paperswithcode.com/paper/comparative-studies-of-detecting-abusive |
Repo | https://github.com/younggns/comparative-abusive-lang |
Framework | tf |