October 20, 2019

3011 words 15 mins read

Paper Group AWR 287

Visual Referring Expression Recognition: What Do Systems Actually Learn?. Correspondence of Deep Neural Networks and the Brain for Visual Textures. Short-term Load Forecasting with Deep Residual Networks. Attention-based Deep Multiple Instance Learning. Stroke Controllable Fast Style Transfer with Adaptive Receptive Fields. Sliced-Wasserstein Flows …

Visual Referring Expression Recognition: What Do Systems Actually Learn?


Title	Visual Referring Expression Recognition: What Do Systems Actually Learn?
Authors	Volkan Cirik, Louis-Philippe Morency, Taylor Berg-Kirkpatrick
Abstract	We present an empirical analysis of the state-of-the-art systems for referring expression recognition – the task of identifying the object in an image referred to by a natural language expression – with the goal of gaining insight into how these systems reason about language and vision. Surprisingly, we find strong evidence that even sophisticated and linguistically-motivated models for this task may ignore the linguistic structure, instead relying on shallow correlations introduced by unintended biases in the data selection and annotation process. For example, we show that a system trained and tested on the input image $\textit{without the input referring expression}$ can achieve a precision of 71.2% in top-2 predictions. Furthermore, a system that predicts only the object category given the input can achieve a precision of 84.2% in top-2 predictions. These surprisingly positive results for what should be deficient prediction scenarios suggest that careful analysis of what our models are learning – and further, how our data is constructed – is critical as we seek to make substantive progress on grounded language tasks.
Tasks
Published	2018-05-30
URL	http://arxiv.org/abs/1805.11818v1
PDF	http://arxiv.org/pdf/1805.11818v1.pdf
PWC	https://paperswithcode.com/paper/visual-referring-expression-recognition-what
Repo	https://github.com/volkancirik/neural-sieves-refexp
Framework	pytorch

Correspondence of Deep Neural Networks and the Brain for Visual Textures


Title	Correspondence of Deep Neural Networks and the Brain for Visual Textures
Authors	Md Nasir Uddin Laskar, Luis G Sanchez Giraldo, Odelia Schwartz
Abstract	Deep convolutional neural networks (CNNs) trained on objects and scenes have shown intriguing ability to predict some response properties of visual cortical neurons. However, the factors and computations that give rise to such ability, and the role of intermediate processing stages in explaining changes that develop across areas of the cortical hierarchy, are poorly understood. We focused on the sensitivity to textures as a paradigmatic example, since recent neurophysiology experiments provide rich data pointing to texture sensitivity in secondary but not primary visual cortex. We developed a quantitative approach for selecting a subset of the neural unit population from the CNN that best describes the brain neural recordings. We found that the first two layers of the CNN showed qualitative and quantitative correspondence to the cortical data across a number of metrics. This compatibility was reduced for the architecture alone rather than the learned weights, for some other related hierarchical models, and only mildly in the absence of a nonlinear computation akin to local divisive normalization. Our results show that the CNN class of model is effective for capturing changes that develop across early areas of cortex, and has the potential to facilitate understanding of the computations that give rise to hierarchical processing in the brain.
Tasks
Published	2018-06-07
URL	http://arxiv.org/abs/1806.02888v1
PDF	http://arxiv.org/pdf/1806.02888v1.pdf
PWC	https://paperswithcode.com/paper/correspondence-of-deep-neural-networks-and
Repo	https://github.com/nasirml/DeepNetAndBrain
Framework	none

Short-term Load Forecasting with Deep Residual Networks


Title	Short-term Load Forecasting with Deep Residual Networks
Authors	Kunjin Chen, Kunlong Chen, Qin Wang, Ziyu He, Jun Hu, Jinliang He
Abstract	We present in this paper a model for forecasting short-term power loads based on deep residual networks. The proposed model is able to integrate domain knowledge and researchers’ understanding of the task by virtue of different neural network building blocks. Specifically, a modified deep residual network is formulated to improve the forecast results. Further, a two-stage ensemble strategy is used to enhance the generalization capability of the proposed model. We also apply the proposed model to probabilistic load forecasting using Monte Carlo dropout. Three public datasets are used to prove the effectiveness of the proposed model. Multiple test cases and comparison with existing models show that the proposed model is able to provide accurate load forecasting results and has high generalization capability.
Tasks	Load Forecasting
Published	2018-05-30
URL	http://arxiv.org/abs/1805.11956v1
PDF	http://arxiv.org/pdf/1805.11956v1.pdf
PWC	https://paperswithcode.com/paper/short-term-load-forecasting-with-deep
Repo	https://github.com/yalickj/load-forecasting-resnet
Framework	none

Attention-based Deep Multiple Instance Learning


Title	Attention-based Deep Multiple Instance Learning
Authors	Maximilian Ilse, Jakub M. Tomczak, Max Welling
Abstract	Multiple instance learning (MIL) is a variation of supervised learning where a single class label is assigned to a bag of instances. In this paper, we state the MIL problem as learning the Bernoulli distribution of the bag label where the bag label probability is fully parameterized by neural networks. Furthermore, we propose a neural network-based permutation-invariant aggregation operator that corresponds to the attention mechanism. Notably, an application of the proposed attention-based operator provides insight into the contribution of each instance to the bag label. We show empirically that our approach achieves comparable performance to the best MIL methods on benchmark MIL datasets and it outperforms other methods on a MNIST-based MIL dataset and two real-life histopathology datasets without sacrificing interpretability.
Tasks	Multiple Instance Learning
Published	2018-02-13
URL	http://arxiv.org/abs/1802.04712v4
PDF	http://arxiv.org/pdf/1802.04712v4.pdf
PWC	https://paperswithcode.com/paper/attention-based-deep-multiple-instance
Repo	https://github.com/mv-lab/youtube8m-19
Framework	tf

Stroke Controllable Fast Style Transfer with Adaptive Receptive Fields


Title	Stroke Controllable Fast Style Transfer with Adaptive Receptive Fields
Authors	Yongcheng Jing, Yang Liu, Yezhou Yang, Zunlei Feng, Yizhou Yu, Dacheng Tao, Mingli Song
Abstract	The Fast Style Transfer methods have been recently proposed to transfer a photograph to an artistic style in real-time. This task involves controlling the stroke size in the stylized results, which remains an open challenge. In this paper, we present a stroke controllable style transfer network that can achieve continuous and spatial stroke size control. By analyzing the factors that influence the stroke size, we propose to explicitly account for the receptive field and the style image scales. We propose a StrokePyramid module to endow the network with adaptive receptive fields, and two training strategies to achieve faster convergence and augment new stroke sizes upon a trained model respectively. By combining the proposed runtime control strategies, our network can achieve continuous changes in stroke sizes and produce distinct stroke sizes in different spatial regions within the same output image.
Tasks	Style Transfer
Published	2018-02-20
URL	http://arxiv.org/abs/1802.07101v4
PDF	http://arxiv.org/pdf/1802.07101v4.pdf
PWC	https://paperswithcode.com/paper/stroke-controllable-fast-style-transfer-with
Repo	https://github.com/LouieYang/stroke-controllable-fast-style-transfer
Framework	tf

Sliced-Wasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and Diffusions


Title	Sliced-Wasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and Diffusions
Authors	Antoine Liutkus, Umut Şimşekli, Szymon Majewski, Alain Durmus, Fabian-Robert Stöter
Abstract	By building upon the recent theory that established the connection between implicit generative modeling (IGM) and optimal transport, in this study, we propose a novel parameter-free algorithm for learning the underlying distributions of complicated datasets and sampling from them. The proposed algorithm is based on a functional optimization problem, which aims at finding a measure that is close to the data distribution as much as possible and also expressive enough for generative modeling purposes. We formulate the problem as a gradient flow in the space of probability measures. The connections between gradient flows and stochastic differential equations let us develop a computationally efficient algorithm for solving the optimization problem. We provide formal theoretical analysis where we prove finite-time error guarantees for the proposed algorithm. To the best of our knowledge, the proposed algorithm is the first nonparametric IGM algorithm with explicit theoretical guarantees. Our experimental results support our theory and show that our algorithm is able to successfully capture the structure of different types of data distributions.
Tasks
Published	2018-06-21
URL	https://arxiv.org/abs/1806.08141v2
PDF	https://arxiv.org/pdf/1806.08141v2.pdf
PWC	https://paperswithcode.com/paper/sliced-wasserstein-flows-nonparametric
Repo	https://github.com/aliutkus/swf
Framework	pytorch

Face-MagNet: Magnifying Feature Maps to Detect Small Faces


Title	Face-MagNet: Magnifying Feature Maps to Detect Small Faces
Authors	Pouya Samangouei, Mahyar Najibi, Larry Davis, Rama Chellappa
Abstract	In this paper, we introduce the Face Magnifier Network (Face-MageNet), a face detector based on the Faster-RCNN framework which enables the flow of discriminative information of small scale faces to the classifier without any skip or residual connections. To achieve this, Face-MagNet deploys a set of ConvTranspose, also known as deconvolution, layers in the Region Proposal Network (RPN) and another set before the Region of Interest (RoI) pooling layer to facilitate detection of finer faces. In addition, we also design, train, and evaluate three other well-tuned architectures that represent the conventional solutions to the scale problem: context pooling, skip connections, and scale partitioning. Each of these three networks achieves comparable results to the state-of-the-art face detectors. With extensive experiments, we show that Face-MagNet based on a VGG16 architecture achieves better results than the recently proposed ResNet101-based HR method on the task of face detection on WIDER dataset and also achieves similar results on the hard set as our other method SSH.
Tasks	Face Detection
Published	2018-03-14
URL	http://arxiv.org/abs/1803.05258v1
PDF	http://arxiv.org/pdf/1803.05258v1.pdf
PWC	https://paperswithcode.com/paper/face-magnet-magnifying-feature-maps-to-detect
Repo	https://github.com/po0ya/face-magnet
Framework	none

Locating Objects Without Bounding Boxes


Title	Locating Objects Without Bounding Boxes
Authors	Javier Ribera, David Güera, Yuhao Chen, Edward J. Delp
Abstract	Recent advances in convolutional neural networks (CNN) have achieved remarkable results in locating objects in images. In these networks, the training procedure usually requires providing bounding boxes or the maximum number of expected objects. In this paper, we address the task of estimating object locations without annotated bounding boxes which are typically hand-drawn and time consuming to label. We propose a loss function that can be used in any fully convolutional network (FCN) to estimate object locations. This loss function is a modification of the average Hausdorff distance between two unordered sets of points. The proposed method has no notion of bounding boxes, region proposals, or sliding windows. We evaluate our method with three datasets designed to locate people’s heads, pupil centers and plant centers. We outperform state-of-the-art generic object detectors and methods fine-tuned for pupil tracking.
Tasks	Object Localization
Published	2018-06-20
URL	http://arxiv.org/abs/1806.07564v2
PDF	http://arxiv.org/pdf/1806.07564v2.pdf
PWC	https://paperswithcode.com/paper/weighted-hausdorff-distance-a-loss-function
Repo	https://github.com/N0vel/weighted-hausdorff-distance-tensorflow-keras-loss
Framework	tf

Additive Margin Softmax for Face Verification


Title	Additive Margin Softmax for Face Verification
Authors	Feng Wang, Weiyang Liu, Haijun Liu, Jian Cheng
Abstract	In this paper, we propose a conceptually simple and geometrically interpretable objective function, i.e. additive margin Softmax (AM-Softmax), for deep face verification. In general, the face verification task can be viewed as a metric learning problem, so learning large-margin face features whose intra-class variation is small and inter-class difference is large is of great importance in order to achieve good performance. Recently, Large-margin Softmax and Angular Softmax have been proposed to incorporate the angular margin in a multiplicative manner. In this work, we introduce a novel additive angular margin for the Softmax loss, which is intuitively appealing and more interpretable than the existing works. We also emphasize and discuss the importance of feature normalization in the paper. Most importantly, our experiments on LFW BLUFR and MegaFace show that our additive margin softmax loss consistently performs better than the current state-of-the-art methods using the same network architecture and training dataset. Our code has also been made available at https://github.com/happynear/AMSoftmax
Tasks	Face Verification, Metric Learning
Published	2018-01-17
URL	http://arxiv.org/abs/1801.05599v4
PDF	http://arxiv.org/pdf/1801.05599v4.pdf
PWC	https://paperswithcode.com/paper/additive-margin-softmax-for-face-verification
Repo	https://github.com/happynear/AMSoftmax
Framework	tf

Monte Carlo Dependency Estimation


Title	Monte Carlo Dependency Estimation
Authors	Edouard Fouché, Klemens Böhm
Abstract	Estimating the dependency of variables is a fundamental task in data analysis. Identifying the relevant attributes in databases leads to better data understanding and also improves the performance of learning algorithms, both in terms of runtime and quality. In data streams, dependency monitoring provides key insights into the underlying process, but is challenging. In this paper, we propose Monte Carlo Dependency Estimation (MCDE), a theoretical framework to estimate multivariate dependency in static and dynamic data. MCDE quantifies dependency as the average discrepancy between marginal and conditional distributions via Monte Carlo simulations. Based on this framework, we present Mann-Whitney P (MWP), a novel dependency estimator. We show that MWP satisfies a number of desirable properties and can accommodate any kind of numerical data. We demonstrate the superiority of our estimator by comparing it to the state-of-the-art multivariate dependency measures.
Tasks
Published	2018-10-04
URL	http://arxiv.org/abs/1810.02112v1
PDF	http://arxiv.org/pdf/1810.02112v1.pdf
PWC	https://paperswithcode.com/paper/monte-carlo-dependency-estimation
Repo	https://github.com/edouardfouche/mcde
Framework	none

Improving Distantly Supervised Relation Extraction using Word and Entity Based Attention


Title	Improving Distantly Supervised Relation Extraction using Word and Entity Based Attention
Authors	Sharmistha Jat, Siddhesh Khandelwal, Partha Talukdar
Abstract	Relation extraction is the problem of classifying the relationship between two entities in a given sentence. Distant Supervision (DS) is a popular technique for developing relation extractors starting with limited supervision. We note that most of the sentences in the distant supervision relation extraction setting are very long and may benefit from word attention for better sentence representation. Our contributions in this paper are threefold. Firstly, we propose two novel word attention models for distantly- supervised relation extraction: (1) a Bi-directional Gated Recurrent Unit (Bi-GRU) based word attention model (BGWA), (2) an entity-centric attention model (EA), and (3) a combination model which combines multiple complementary models using weighted voting method for improved relation extraction. Secondly, we introduce GDS, a new distant supervision dataset for relation extraction. GDS removes test data noise present in all previous distant- supervision benchmark datasets, making credible automatic evaluation possible. Thirdly, through extensive experiments on multiple real-world datasets, we demonstrate the effectiveness of the proposed methods.
Tasks	Relation Extraction, Relationship Extraction (Distant Supervised)
Published	2018-04-19
URL	http://arxiv.org/abs/1804.06987v1
PDF	http://arxiv.org/pdf/1804.06987v1.pdf
PWC	https://paperswithcode.com/paper/improving-distantly-supervised-relation-1
Repo	https://github.com/JiyangZhang/learning-to-reweight-in-relation-extraion
Framework	pytorch

Dual Ask-Answer Network for Machine Reading Comprehension


Title	Dual Ask-Answer Network for Machine Reading Comprehension
Authors	Han Xiao, Feng Wang, Jianfeng Yan, Jingyao Zheng
Abstract	There are three modalities in the reading comprehension setting: question, answer and context. The task of question answering or question generation aims to infer an answer or a question when given the counterpart based on context. We present a novel two-way neural sequence transduction model that connects three modalities, allowing it to learn two tasks simultaneously and mutually benefit one another. During training, the model receives question-context-answer triplets as input and captures the cross-modal interaction via a hierarchical attention process. Unlike previous joint learning paradigms that leverage the duality of question generation and question answering at data level, we solve such dual tasks at the architecture level by mirroring the network structure and partially sharing components at different layers. This enables the knowledge to be transferred from one task to another, helping the model to find a general representation for each modality. The evaluation on four public datasets shows that our dual-learning model outperforms the mono-learning counterpart as well as the state-of-the-art joint models on both question answering and question generation tasks.
Tasks	Machine Reading Comprehension, Question Answering, Question Generation, Reading Comprehension
Published	2018-09-06
URL	http://arxiv.org/abs/1809.01997v2
PDF	http://arxiv.org/pdf/1809.01997v2.pdf
PWC	https://paperswithcode.com/paper/dual-ask-answer-network-for-machine-reading
Repo	https://github.com/hanxiao/daanet
Framework	tf

Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors


Title	Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors
Authors	Dmitry Baranchuk, Artem Babenko, Yury Malkov
Abstract	This work addresses the problem of billion-scale nearest neighbor search. The state-of-the-art retrieval systems for billion-scale databases are currently based on the inverted multi-index, the recently proposed generalization of the inverted index structure. The multi-index provides a very fine-grained partition of the feature space that allows extracting concise and accurate short-lists of candidates for the search queries. In this paper, we argue that the potential of the simple inverted index was not fully exploited in previous works and advocate its usage both for the highly-entangled deep descriptors and relatively disentangled SIFT descriptors. We introduce a new retrieval system that is based on the inverted index and outperforms the multi-index by a large margin for the same memory consumption and construction complexity. For example, our system achieves the state-of-the-art recall rates several times faster on the dataset of one billion deep descriptors compared to the efficient implementation of the inverted multi-index from the FAISS library.
Tasks
Published	2018-02-07
URL	http://arxiv.org/abs/1802.02422v2
PDF	http://arxiv.org/pdf/1802.02422v2.pdf
PWC	https://paperswithcode.com/paper/revisiting-the-inverted-indices-for-billion
Repo	https://github.com/nmslib/hnsw
Framework	mxnet

A New Ensemble Learning Framework for 3D Biomedical Image Segmentation


Title	A New Ensemble Learning Framework for 3D Biomedical Image Segmentation
Authors	Hao Zheng, Yizhe Zhang, Lin Yang, Peixian Liang, Zhuo Zhao, Chaoli Wang, Danny Z. Chen
Abstract	3D image segmentation plays an important role in biomedical image analysis. Many 2D and 3D deep learning models have achieved state-of-the-art segmentation performance on 3D biomedical image datasets. Yet, 2D and 3D models have their own strengths and weaknesses, and by unifying them together, one may be able to achieve more accurate results. In this paper, we propose a new ensemble learning framework for 3D biomedical image segmentation that combines the merits of 2D and 3D models. First, we develop a fully convolutional network based meta-learner to learn how to improve the results from 2D and 3D models (base-learners). Then, to minimize over-fitting for our sophisticated meta-learner, we devise a new training method that uses the results of the base-learners as multiple versions of “ground truths”. Furthermore, since our new meta-learner training scheme does not depend on manual annotation, it can utilize abundant unlabeled 3D image data to further improve the model. Extensive experiments on two public datasets (the HVSMR 2016 Challenge dataset and the mouse piriform cortex dataset) show that our approach is effective under fully-supervised, semi-supervised, and transductive settings, and attains superior performance over state-of-the-art image segmentation methods.
Tasks	3D Medical Imaging Segmentation, Semantic Segmentation
Published	2018-12-10
URL	http://arxiv.org/abs/1812.03945v1
PDF	http://arxiv.org/pdf/1812.03945v1.pdf
PWC	https://paperswithcode.com/paper/a-new-ensemble-learning-framework-for-3d
Repo	https://github.com/HaoZheng94/Ensemble
Framework	none

Comparative Studies of Detecting Abusive Language on Twitter


Title	Comparative Studies of Detecting Abusive Language on Twitter
Authors	Younghun Lee, Seunghyun Yoon, Kyomin Jung
Abstract	The context-dependent nature of online aggression makes annotating large collections of data extremely difficult. Previously studied datasets in abusive language detection have been insufficient in size to efficiently train deep learning models. Recently, Hate and Abusive Speech on Twitter, a dataset much greater in size and reliability, has been released. However, this dataset has not been comprehensively studied to its potential. In this paper, we conduct the first comparative study of various learning models on Hate and Abusive Speech on Twitter, and discuss the possibility of using additional features and context data for improvements. Experimental results show that bidirectional GRU networks trained on word-level features, with Latent Topic Clustering modules, is the most accurate model scoring 0.805 F1.
Tasks	Hate Speech Detection, Sentiment Analysis, Twitter Sentiment Analysis
Published	2018-08-30
URL	http://arxiv.org/abs/1808.10245v1
PDF	http://arxiv.org/pdf/1808.10245v1.pdf
PWC	https://paperswithcode.com/paper/comparative-studies-of-detecting-abusive
Repo	https://github.com/younggns/comparative-abusive-lang
Framework	tf