February 1, 2020

3156 words 15 mins read

Paper Group AWR 161

Enhancing high-content imaging for studying microtubule networks at large-scale. Spatiotemporal Attention Networks for Wind Power Forecasting. Learning to Remember More with Less Memorization. A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks. Clustering with Fairness Constraints: A Flexible and Scalable Approach. Modeling …

Enhancing high-content imaging for studying microtubule networks at large-scale


Title	Enhancing high-content imaging for studying microtubule networks at large-scale
Authors	Hao-Chih Lee, Sarah T Cherng, Riccardo Miotto, Joel T Dudley
Abstract	Given the crucial role of microtubules for cell survival, many researchers have found success using microtubule-targeting agents in the search for effective cancer therapeutics. Understanding microtubule responses to targeted interventions requires that the microtubule network within cells can be consistently observed across a large sample of images. However, fluorescence noise sources captured simultaneously with biological signals while using wide-field microscopes can obfuscate fine microtubule structures. Such requirements are particularly challenging for high-throughput imaging, where researchers must make decisions related to the trade-off between imaging quality and speed. Here, we propose a computational framework to enhance the quality of high-throughput imaging data to achieve fast speed and high quality simultaneously. Using CycleGAN, we learn an image model from low-throughput, high-resolution images to enhance features, such as microtubule networks in high-throughput low-resolution images. We show that CycleGAN is effective in identifying microtubules with 0.93+ AUC-ROC and that these results are robust to different kinds of image noise. We further apply CycleGAN to quantify the changes in microtubule density as a result of the application of drug compounds, and show that the quantified responses correspond well with known drug effects
Tasks
Published	2019-10-01
URL	https://arxiv.org/abs/1910.00662v1
PDF	https://arxiv.org/pdf/1910.00662v1.pdf
PWC	https://paperswithcode.com/paper/enhancing-high-content-imaging-for-studying
Repo	https://github.com/DudleyLab/widefield2confocal
Framework	none

Spatiotemporal Attention Networks for Wind Power Forecasting


Title	Spatiotemporal Attention Networks for Wind Power Forecasting
Authors	Xingbo Fu, Feng Gao, Jiang Wu, Xinyu Wei, Fangwei Duan
Abstract	Wind power is one of the most important renewable energy sources and accurate wind power forecasting is very significant for reliable and economic power system operation and control strategies. This paper proposes a novel framework with spatiotemporal attention networks (STAN) for wind power forecasting. This model captures spatial correlations among wind farms and temporal dependencies of wind power time series. First of all, we employ a multi-head self-attention mechanism to extract spatial correlations among wind farms. Then, temporal dependencies are captured by the Sequence-to-Sequence (Seq2Seq) model with a global attention mechanism. Finally, experimental results demonstrate that our model achieves better performance than other baseline approaches. Our work provides useful insights to capture non-Euclidean spatial correlations.
Tasks	Time Series
Published	2019-09-14
URL	https://arxiv.org/abs/1909.07369v2
PDF	https://arxiv.org/pdf/1909.07369v2.pdf
PWC	https://paperswithcode.com/paper/spatiotemporal-attention-networks-for-wind
Repo	https://github.com/xbfu/Spatiotemporal-Attention-Networks
Framework	pytorch

Learning to Remember More with Less Memorization


Title	Learning to Remember More with Less Memorization
Authors	Hung Le, Truyen Tran, Svetha Venkatesh
Abstract	Memory-augmented neural networks consisting of a neural controller and an external memory have shown potentials in long-term sequential learning. Current RAM-like memory models maintain memory accessing every timesteps, thus they do not effectively leverage the short-term memory held in the controller. We hypothesize that this scheme of writing is suboptimal in memory utilization and introduces redundant computation. To validate our hypothesis, we derive a theoretical bound on the amount of information stored in a RAM-like system and formulate an optimization problem that maximizes the bound. The proposed solution dubbed Uniform Writing is proved to be optimal under the assumption of equal timestep contributions. To relax this assumption, we introduce modifications to the original solution, resulting in a solution termed Cached Uniform Writing. This method aims to balance between maximizing memorization and forgetting via overwriting mechanisms. Through an extensive set of experiments, we empirically demonstrate the advantages of our solutions over other recurrent architectures, claiming the state-of-the-arts in various sequential modeling tasks.
Tasks	Sentiment Analysis, Sequential Image Classification, Text Classification
Published	2019-01-05
URL	http://arxiv.org/abs/1901.01347v2
PDF	http://arxiv.org/pdf/1901.01347v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-remember-more-with-less
Repo	https://github.com/thaihungle/UW-DNC
Framework	tf

A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks


Title	A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks
Authors	Umut Simsekli, Levent Sagun, Mert Gurbuzbalaban
Abstract	The gradient noise (GN) in the stochastic gradient descent (SGD) algorithm is often considered to be Gaussian in the large data regime by assuming that the classical central limit theorem (CLT) kicks in. This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion. We argue that the Gaussianity assumption might fail to hold in deep learning settings and hence render the Brownian motion-based analyses inappropriate. Inspired by non-Gaussian natural phenomena, we consider the GN in a more general context and invoke the generalized CLT (GCLT), which suggests that the GN converges to a heavy-tailed $\alpha$-stable random variable. Accordingly, we propose to analyze SGD as an SDE driven by a L'{e}vy motion. Such SDEs can incur `jumps’, which force the SDE transition from narrow minima to wider minima, as proven by existing metastability theory. To validate the $\alpha$-stable assumption, we conduct extensive experiments on common deep learning architectures and show that in all settings, the GN is highly non-Gaussian and admits heavy-tails. We further investigate the tail behavior in varying network architectures and sizes, loss functions, and datasets. Our results open up a different perspective and shed more light on the belief that SGD prefers wide minima. \|
Tasks
Published	2019-01-18
URL	http://arxiv.org/abs/1901.06053v1
PDF	http://arxiv.org/pdf/1901.06053v1.pdf
PWC	https://paperswithcode.com/paper/a-tail-index-analysis-of-stochastic-gradient
Repo	https://github.com/umutsimsekli/sgd_tail_index
Framework	pytorch

Clustering with Fairness Constraints: A Flexible and Scalable Approach


Title	Clustering with Fairness Constraints: A Flexible and Scalable Approach
Authors	Imtiaz Masud Ziko, Eric Granger, Jing Yuan, Ismail Ben Ayed
Abstract	This study investigates a general variational formulation of fair clustering, which can integrate fairness constraints with a large class of clustering objectives. Unlike the existing methods, our formulation can impose any desired (target) demographic proportions within each cluster. Furthermore, it enables to control the trade-off between the fairness and clustering terms. We derive an auxiliary function (tight upper bound) of our KL-based fairness penalty via its concave-convex decomposition and Lipschitz-gradient property. Our upper bound can be optimized jointly with various clustering objectives, including prototype-based, such as K-means and K-median, or graph-based such as Normalized Cut. Interestingly, at each iteration, our general fair-clustering algorithm performs an independent update for each assignment variable, while guaranteeing convergence. Therefore, it can be easily distributed for large-scale data sets. Such scalability is important as it enables to explore different trade-off levels between the fairness and clustering objectives. Unlike fairness-constrained spectral clustering, our formulation does not need storing an affinity matrix and computing its eigenvalue decomposition. We show the effectiveness, flexibility and scalability of our approach through comprehensive evaluations and comparisons to the existing methods over several data sets.
Tasks
Published	2019-06-19
URL	https://arxiv.org/abs/1906.08207v3
PDF	https://arxiv.org/pdf/1906.08207v3.pdf
PWC	https://paperswithcode.com/paper/clustering-with-fairness-constraints-a
Repo	https://github.com/imtiazziko/Clustering-with-fairness-constraints
Framework	none

Modeling Multi-Vehicle Interaction Scenarios Using Gaussian Random Field


Title	Modeling Multi-Vehicle Interaction Scenarios Using Gaussian Random Field
Authors	Yaohui Guo, Vinay Varma Kalidindi, Mansur Arief, Wenshuo Wang, Jiacheng Zhu, Huei Peng, Ding Zhao
Abstract	Autonomous vehicles are expected to navigate in complex traffic scenarios with multiple surrounding vehicles. The correlations between road users vary over time, the degree of which, in theory, could be infinitely large, thus posing a great challenge in modeling and predicting the driving environment. In this paper, we propose a method to model multi-vehicle interactions using a stochastic vector field model and apply non-parametric Bayesian learning to extract the underlying motion patterns from a large quantity of naturalistic traffic data. We then use this model to reproduce the high-dimensional driving scenarios in a finitely tractable form. We use a Gaussian process to model multi-vehicle motion, and a Dirichlet process to assign each observation to a specific scenario. We verify the effectiveness of the proposed method on highway and intersection datasets from the NGSIM project, in which complex multi-vehicle interactions are prevalent. The results show that the proposed method can capture motion patterns from both settings, without imposing heroic prior, and hence demonstrate the potential application for a wide array of traffic situations. The proposed modeling method could enable simulation platforms and other testing methods designed for autonomous vehicle evaluation, to easily model and generate traffic scenarios emulating large scale driving data.
Tasks	Autonomous Vehicles
Published	2019-06-25
URL	https://arxiv.org/abs/1906.10307v2
PDF	https://arxiv.org/pdf/1906.10307v2.pdf
PWC	https://paperswithcode.com/paper/modeling-multi-vehicle-interaction-scenarios
Repo	https://github.com/mxu34/DPGP
Framework	none


Title	Neural Blind Deconvolution Using Deep Priors
Authors	Dongwei Ren, Kai Zhang, Qilong Wang, Qinghua Hu, Wangmeng Zuo
Abstract	Blind deconvolution is a classical yet challenging low-level vision problem with many real-world applications. Traditional maximum a posterior (MAP) based methods rely heavily on fixed and handcrafted priors that certainly are insufficient in characterizing clean images and blur kernels, and usually adopt specially designed alternating minimization to avoid trivial solution. In contrast, existing deep motion deblurring networks learn from massive training images the mapping to clean image or blur kernel, but are limited in handling various complex and large size blur kernels. To connect MAP and deep models, we in this paper present two generative networks for respectively modeling the deep priors of clean image and blur kernel, and propose an unconstrained neural optimization solution to blind deconvolution. In particular, we adopt an asymmetric Autoencoder with skip connections for generating latent clean image, and a fully-connected network (FCN) for generating blur kernel. Moreover, the SoftMax nonlinearity is applied to the output layer of FCN to meet the non-negative and equality constraints. The process of neural optimization can be explained as a kind of “zero-shot” self-supervised learning of the generative networks, and thus our proposed method is dubbed SelfDeblur. Experimental results show that our SelfDeblur can achieve notable quantitative gains as well as more visually plausible deblurring results in comparison to state-of-the-art blind deconvolution methods on benchmark datasets and real-world blurry images. The source code is available at https://github.com/csdwren/SelfDeblur
Tasks	Deblurring
Published	2019-08-06
URL	https://arxiv.org/abs/1908.02197v2
PDF	https://arxiv.org/pdf/1908.02197v2.pdf
PWC	https://paperswithcode.com/paper/neural-blind-deconvolution-using-deep-priors
Repo	https://github.com/csdwren/SelfDeblur
Framework	pytorch

Can neural networks understand monotonicity reasoning?


Title	Can neural networks understand monotonicity reasoning?
Authors	Hitomi Yanaka, Koji Mineshima, Daisuke Bekki, Kentaro Inui, Satoshi Sekine, Lasha Abzianidze, Johan Bos
Abstract	Monotonicity reasoning is one of the important reasoning skills for any intelligent natural language inference (NLI) model in that it requires the ability to capture the interaction between lexical and syntactic structures. Since no test set has been developed for monotonicity reasoning with wide coverage, it is still unclear whether neural models can perform monotonicity reasoning in a proper way. To investigate this issue, we introduce the Monotonicity Entailment Dataset (MED). Performance by state-of-the-art NLI models on the new test set is substantially worse, under 55%, especially on downward reasoning. In addition, analysis using a monotonicity-driven data augmentation method showed that these models might be limited in their generalization ability in upward and downward reasoning.
Tasks	Data Augmentation, Natural Language Inference
Published	2019-06-15
URL	https://arxiv.org/abs/1906.06448v2
PDF	https://arxiv.org/pdf/1906.06448v2.pdf
PWC	https://paperswithcode.com/paper/can-neural-networks-understand-monotonicity
Repo	https://github.com/verypluming/MED
Framework	none

Bad Global Minima Exist and SGD Can Reach Them


Title	Bad Global Minima Exist and SGD Can Reach Them
Authors	Shengchao Liu, Dimitris Papailiopoulos, Dimitris Achlioptas
Abstract	Several recent works have aimed to explain why severely overparameterized models, generalize well when trained by Stochastic Gradient Descent (SGD). The emergent consensus explanation has two parts: the first is that there are “no bad local minima”, while the second is that SGD performs implicit regularization by having a bias towards low complexity models. We revisit both of these ideas in the context of image classification with common deep neural network architectures. Our first finding is that there exist bad global minima, i.e., models that fit the training set perfectly, yet have poor generalization. Our second finding is that given only unlabeled training data, we can easily construct initializations that will cause SGD to quickly converge to such bad global minima. For example, on CIFAR, CINIC10, and (Restricted) ImageNet, this can be achieved by starting SGD at a model derived by fitting random labels on the training data: while subsequent SGD training (with the correct labels) will reach zero training error, the resulting model will exhibit a test accuracy degradation of up to 40% compared to training from a random initialization. Finally, we show that regularization seems to provide SGD with an escape route: once heuristics such as data augmentation are used, starting from a complex model (adversarial initialization) has no effect on the test accuracy.
Tasks	Data Augmentation, Image Classification
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02613v1
PDF	https://arxiv.org/pdf/1906.02613v1.pdf
PWC	https://paperswithcode.com/paper/bad-global-minima-exist-and-sgd-can-reach
Repo	https://github.com/chao1224/BadGlobalMinima
Framework	pytorch

Grid Saliency for Context Explanations of Semantic Segmentation


Title	Grid Saliency for Context Explanations of Semantic Segmentation
Authors	Lukas Hoyer, Mauricio Munoz, Prateek Katiyar, Anna Khoreva, Volker Fischer
Abstract	Recently, there has been a growing interest in developing saliency methods that provide visual explanations of network predictions. Still, the usability of existing methods is limited to image classification models. To overcome this limitation, we extend the existing approaches to generate grid saliencies, which provide spatially coherent visual explanations for (pixel-level) dense prediction networks. As the proposed grid saliency allows to spatially disentangle the object and its context, we specifically explore its potential to produce context explanations for semantic segmentation networks, discovering which context most influences the class predictions inside a target object area. We investigate the effectiveness of grid saliency on a synthetic dataset with an artificially induced bias between objects and their context as well as on the real-world Cityscapes dataset using state-of-the-art segmentation networks. Our results show that grid saliency can be successfully used to provide easily interpretable context explanations and, moreover, can be employed for detecting and localizing contextual biases present in the data.
Tasks	Image Classification, Semantic Segmentation
Published	2019-07-30
URL	https://arxiv.org/abs/1907.13054v2
PDF	https://arxiv.org/pdf/1907.13054v2.pdf
PWC	https://paperswithcode.com/paper/grid-saliency-for-context-explanations-of
Repo	https://github.com/boschresearch/GridSaliency-ToyDatasetGen
Framework	none

Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image


Title	Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image
Authors	Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee
Abstract	Although significant improvement has been achieved recently in 3D human pose estimation, most of the previous methods only treat a single-person case. In this work, we firstly propose a fully learning-based, camera distance-aware top-down approach for 3D multi-person pose estimation from a single RGB image. The pipeline of the proposed system consists of human detection, absolute 3D human root localization, and root-relative 3D single-person pose estimation modules. Our system achieves comparable results with the state-of-the-art 3D single-person pose estimation models without any groundtruth information and significantly outperforms previous 3D multi-person pose estimation methods on publicly available datasets. The code is available in https://github.com/mks0601/3DMPPE_ROOTNET_RELEASE , https://github.com/mks0601/3DMPPE_POSENET_RELEASE.
Tasks	3D Absolute Human Pose Estimation, 3D Human Pose Estimation, 3D Multi-person Pose Estimation, 3D Multi-person Pose Estimation (absolute), 3D Multi-person Pose Estimation (root-relative), Multi-Person Pose Estimation, Pose Estimation
Published	2019-07-26
URL	https://arxiv.org/abs/1907.11346v2
PDF	https://arxiv.org/pdf/1907.11346v2.pdf
PWC	https://paperswithcode.com/paper/camera-distance-aware-top-down-approach-for
Repo	https://github.com/mks0601/3DMPPE_ROOTNET_RELEASE
Framework	pytorch

ChainerRL: A Deep Reinforcement Learning Library


Title	ChainerRL: A Deep Reinforcement Learning Library
Authors	Yasuhiro Fujita, Toshiki Kataoka, Prabhat Nagarajan, Takahiro Ishikawa
Abstract	In this paper, we introduce ChainerRL, an open-source Deep Reinforcement Learning (DRL) library built using Python and the Chainer deep learning framework. ChainerRL implements a comprehensive set of DRL algorithms and techniques drawn from the state-of-the-art research in the field. To foster reproducible research, and for instructional purposes, ChainerRL provides scripts that closely replicate the original papers’ experimental settings and reproduce published benchmark results for several algorithms. Lastly, ChainerRL offers a visualization tool that enables the qualitative inspection of trained agents. The ChainerRL source code can be found on GitHub: https://github.com/chainer/chainerrl .
Tasks
Published	2019-12-09
URL	https://arxiv.org/abs/1912.03905v1
PDF	https://arxiv.org/pdf/1912.03905v1.pdf
PWC	https://paperswithcode.com/paper/chainerrl-a-deep-reinforcement-learning
Repo	https://github.com/chainer/chainerrl
Framework	none

Alternative Weighting Schemes for ELMo Embeddings


Title	Alternative Weighting Schemes for ELMo Embeddings
Authors	Nils Reimers, Iryna Gurevych
Abstract	ELMo embeddings (Peters et. al, 2018) had a huge impact on the NLP community and may recent publications use these embeddings to boost the performance for downstream NLP tasks. However, integration of ELMo embeddings in existent NLP architectures is not straightforward. In contrast to traditional word embeddings, like GloVe or word2vec embeddings, the bi-directional language model of ELMo produces three 1024 dimensional vectors per token in a sentence. Peters et al. proposed to learn a task-specific weighting of these three vectors for downstream tasks. However, this proposed weighting scheme is not feasible for certain tasks, and, as we will show, it does not necessarily yield optimal performance. We evaluate different methods that combine the three vectors from the language model in order to achieve the best possible performance in downstream NLP tasks. We notice that the third layer of the published language model often decreases the performance. By learning a weighted average of only the first two layers, we are able to improve the performance for many datasets. Due to the reduced complexity of the language model, we have a training speed-up of 19-44% for the downstream task.
Tasks	Language Modelling, Word Embeddings
Published	2019-04-05
URL	http://arxiv.org/abs/1904.02954v1
PDF	http://arxiv.org/pdf/1904.02954v1.pdf
PWC	https://paperswithcode.com/paper/alternative-weighting-schemes-for-elmo
Repo	https://github.com/UKPLab/elmo-bilstm-cnn-crf
Framework	tf

CLEARumor at SemEval-2019 Task 7: ConvoLving ELMo Against Rumors


Title	CLEARumor at SemEval-2019 Task 7: ConvoLving ELMo Against Rumors
Authors	Ipek Baris, Lukas Schmelzeisen, Steffen Staab
Abstract	This paper describes our submission to SemEval-2019 Task 7: RumourEval: Determining Rumor Veracity and Support for Rumors. We participated in both subtasks. The goal of subtask A is to classify the type of interaction between a rumorous social media post and a reply post as support, query, deny, or comment. The goal of subtask B is to predict the veracity of a given rumor. For subtask A, we implement a CNN-based neural architecture using ELMo embeddings of post text combined with auxiliary features and achieve a F1-score of 44.6%. For subtask B, we employ a MLP neural network leveraging our estimates for subtask A and achieve a F1-score of 30.1% (second place in the competition). We provide results and analysis of our system performance and present ablation experiments.
Tasks	Rumour Detection
Published	2019-04-05
URL	http://arxiv.org/abs/1904.03084v1
PDF	http://arxiv.org/pdf/1904.03084v1.pdf
PWC	https://paperswithcode.com/paper/clearumor-at-semeval-2019-task-7-convolving
Repo	https://github.com/Institute-Web-Science-and-Technologies/CLEARumor
Framework	pytorch

Fine-Grained Named Entity Recognition using ELMo and Wikidata


Title	Fine-Grained Named Entity Recognition using ELMo and Wikidata
Authors	Cihan Dogan, Aimore Dutra, Adam Gara, Alfredo Gemma, Lei Shi, Michael Sigamani, Ella Walters
Abstract	Fine-grained Named Entity Recognition is a task whereby we detect and classify entity mentions to a large set of types. These types can span diverse domains such as finance, healthcare, and politics. We observe that when the type set spans several domains the accuracy of the entity detection becomes a limitation for supervised learning models. The primary reason being the lack of datasets where entity boundaries are properly annotated, whilst covering a large spectrum of entity types. Furthermore, many named entity systems suffer when considering the categorization of fine grained entity types. Our work attempts to address these issues, in part, by combining state-of-the-art deep learning models (ELMo) with an expansive knowledge base (Wikidata). Using our framework, we cross-validate our model on the 112 fine-grained entity types based on the hierarchy given from the Wiki(gold) dataset.
Tasks	Named Entity Recognition
Published	2019-04-23
URL	http://arxiv.org/abs/1904.10503v1
PDF	http://arxiv.org/pdf/1904.10503v1.pdf
PWC	https://paperswithcode.com/paper/fine-grained-named-entity-recognition-using
Repo	https://github.com/sigamani/ner
Framework	none