Paper Group AWR 161
Enhancing high-content imaging for studying microtubule networks at large-scale. Spatiotemporal Attention Networks for Wind Power Forecasting. Learning to Remember More with Less Memorization. A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks. Clustering with Fairness Constraints: A Flexible and Scalable Approach. Modeling …
Enhancing high-content imaging for studying microtubule networks at large-scale
Title | Enhancing high-content imaging for studying microtubule networks at large-scale |
Authors | Hao-Chih Lee, Sarah T Cherng, Riccardo Miotto, Joel T Dudley |
Abstract | Given the crucial role of microtubules for cell survival, many researchers have found success using microtubule-targeting agents in the search for effective cancer therapeutics. Understanding microtubule responses to targeted interventions requires that the microtubule network within cells can be consistently observed across a large sample of images. However, fluorescence noise sources captured simultaneously with biological signals while using wide-field microscopes can obfuscate fine microtubule structures. Such requirements are particularly challenging for high-throughput imaging, where researchers must make decisions related to the trade-off between imaging quality and speed. Here, we propose a computational framework to enhance the quality of high-throughput imaging data to achieve fast speed and high quality simultaneously. Using CycleGAN, we learn an image model from low-throughput, high-resolution images to enhance features, such as microtubule networks in high-throughput low-resolution images. We show that CycleGAN is effective in identifying microtubules with 0.93+ AUC-ROC and that these results are robust to different kinds of image noise. We further apply CycleGAN to quantify the changes in microtubule density as a result of the application of drug compounds, and show that the quantified responses correspond well with known drug effects |
Tasks | |
Published | 2019-10-01 |
URL | https://arxiv.org/abs/1910.00662v1 |
https://arxiv.org/pdf/1910.00662v1.pdf | |
PWC | https://paperswithcode.com/paper/enhancing-high-content-imaging-for-studying |
Repo | https://github.com/DudleyLab/widefield2confocal |
Framework | none |
Spatiotemporal Attention Networks for Wind Power Forecasting
Title | Spatiotemporal Attention Networks for Wind Power Forecasting |
Authors | Xingbo Fu, Feng Gao, Jiang Wu, Xinyu Wei, Fangwei Duan |
Abstract | Wind power is one of the most important renewable energy sources and accurate wind power forecasting is very significant for reliable and economic power system operation and control strategies. This paper proposes a novel framework with spatiotemporal attention networks (STAN) for wind power forecasting. This model captures spatial correlations among wind farms and temporal dependencies of wind power time series. First of all, we employ a multi-head self-attention mechanism to extract spatial correlations among wind farms. Then, temporal dependencies are captured by the Sequence-to-Sequence (Seq2Seq) model with a global attention mechanism. Finally, experimental results demonstrate that our model achieves better performance than other baseline approaches. Our work provides useful insights to capture non-Euclidean spatial correlations. |
Tasks | Time Series |
Published | 2019-09-14 |
URL | https://arxiv.org/abs/1909.07369v2 |
https://arxiv.org/pdf/1909.07369v2.pdf | |
PWC | https://paperswithcode.com/paper/spatiotemporal-attention-networks-for-wind |
Repo | https://github.com/xbfu/Spatiotemporal-Attention-Networks |
Framework | pytorch |
Learning to Remember More with Less Memorization
Title | Learning to Remember More with Less Memorization |
Authors | Hung Le, Truyen Tran, Svetha Venkatesh |
Abstract | Memory-augmented neural networks consisting of a neural controller and an external memory have shown potentials in long-term sequential learning. Current RAM-like memory models maintain memory accessing every timesteps, thus they do not effectively leverage the short-term memory held in the controller. We hypothesize that this scheme of writing is suboptimal in memory utilization and introduces redundant computation. To validate our hypothesis, we derive a theoretical bound on the amount of information stored in a RAM-like system and formulate an optimization problem that maximizes the bound. The proposed solution dubbed Uniform Writing is proved to be optimal under the assumption of equal timestep contributions. To relax this assumption, we introduce modifications to the original solution, resulting in a solution termed Cached Uniform Writing. This method aims to balance between maximizing memorization and forgetting via overwriting mechanisms. Through an extensive set of experiments, we empirically demonstrate the advantages of our solutions over other recurrent architectures, claiming the state-of-the-arts in various sequential modeling tasks. |
Tasks | Sentiment Analysis, Sequential Image Classification, Text Classification |
Published | 2019-01-05 |
URL | http://arxiv.org/abs/1901.01347v2 |
http://arxiv.org/pdf/1901.01347v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-remember-more-with-less |
Repo | https://github.com/thaihungle/UW-DNC |
Framework | tf |
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks
Title | A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks |
Authors | Umut Simsekli, Levent Sagun, Mert Gurbuzbalaban |
Abstract | The gradient noise (GN) in the stochastic gradient descent (SGD) algorithm is often considered to be Gaussian in the large data regime by assuming that the classical central limit theorem (CLT) kicks in. This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion. We argue that the Gaussianity assumption might fail to hold in deep learning settings and hence render the Brownian motion-based analyses inappropriate. Inspired by non-Gaussian natural phenomena, we consider the GN in a more general context and invoke the generalized CLT (GCLT), which suggests that the GN converges to a heavy-tailed $\alpha$-stable random variable. Accordingly, we propose to analyze SGD as an SDE driven by a L'{e}vy motion. Such SDEs can incur `jumps’, which force the SDE transition from narrow minima to wider minima, as proven by existing metastability theory. To validate the $\alpha$-stable assumption, we conduct extensive experiments on common deep learning architectures and show that in all settings, the GN is highly non-Gaussian and admits heavy-tails. We further investigate the tail behavior in varying network architectures and sizes, loss functions, and datasets. Our results open up a different perspective and shed more light on the belief that SGD prefers wide minima. | |
Tasks | |
Published | 2019-01-18 |
URL | http://arxiv.org/abs/1901.06053v1 |
http://arxiv.org/pdf/1901.06053v1.pdf | |
PWC | https://paperswithcode.com/paper/a-tail-index-analysis-of-stochastic-gradient |
Repo | https://github.com/umutsimsekli/sgd_tail_index |
Framework | pytorch |
Clustering with Fairness Constraints: A Flexible and Scalable Approach
Title | Clustering with Fairness Constraints: A Flexible and Scalable Approach |
Authors | Imtiaz Masud Ziko, Eric Granger, Jing Yuan, Ismail Ben Ayed |
Abstract | This study investigates a general variational formulation of fair clustering, which can integrate fairness constraints with a large class of clustering objectives. Unlike the existing methods, our formulation can impose any desired (target) demographic proportions within each cluster. Furthermore, it enables to control the trade-off between the fairness and clustering terms. We derive an auxiliary function (tight upper bound) of our KL-based fairness penalty via its concave-convex decomposition and Lipschitz-gradient property. Our upper bound can be optimized jointly with various clustering objectives, including prototype-based, such as K-means and K-median, or graph-based such as Normalized Cut. Interestingly, at each iteration, our general fair-clustering algorithm performs an independent update for each assignment variable, while guaranteeing convergence. Therefore, it can be easily distributed for large-scale data sets. Such scalability is important as it enables to explore different trade-off levels between the fairness and clustering objectives. Unlike fairness-constrained spectral clustering, our formulation does not need storing an affinity matrix and computing its eigenvalue decomposition. We show the effectiveness, flexibility and scalability of our approach through comprehensive evaluations and comparisons to the existing methods over several data sets. |
Tasks | |
Published | 2019-06-19 |
URL | https://arxiv.org/abs/1906.08207v3 |
https://arxiv.org/pdf/1906.08207v3.pdf | |
PWC | https://paperswithcode.com/paper/clustering-with-fairness-constraints-a |
Repo | https://github.com/imtiazziko/Clustering-with-fairness-constraints |
Framework | none |
Modeling Multi-Vehicle Interaction Scenarios Using Gaussian Random Field
Title | Modeling Multi-Vehicle Interaction Scenarios Using Gaussian Random Field |
Authors | Yaohui Guo, Vinay Varma Kalidindi, Mansur Arief, Wenshuo Wang, Jiacheng Zhu, Huei Peng, Ding Zhao |
Abstract | Autonomous vehicles are expected to navigate in complex traffic scenarios with multiple surrounding vehicles. The correlations between road users vary over time, the degree of which, in theory, could be infinitely large, thus posing a great challenge in modeling and predicting the driving environment. In this paper, we propose a method to model multi-vehicle interactions using a stochastic vector field model and apply non-parametric Bayesian learning to extract the underlying motion patterns from a large quantity of naturalistic traffic data. We then use this model to reproduce the high-dimensional driving scenarios in a finitely tractable form. We use a Gaussian process to model multi-vehicle motion, and a Dirichlet process to assign each observation to a specific scenario. We verify the effectiveness of the proposed method on highway and intersection datasets from the NGSIM project, in which complex multi-vehicle interactions are prevalent. The results show that the proposed method can capture motion patterns from both settings, without imposing heroic prior, and hence demonstrate the potential application for a wide array of traffic situations. The proposed modeling method could enable simulation platforms and other testing methods designed for autonomous vehicle evaluation, to easily model and generate traffic scenarios emulating large scale driving data. |
Tasks | Autonomous Vehicles |
Published | 2019-06-25 |
URL | https://arxiv.org/abs/1906.10307v2 |
https://arxiv.org/pdf/1906.10307v2.pdf | |
PWC | https://paperswithcode.com/paper/modeling-multi-vehicle-interaction-scenarios |
Repo | https://github.com/mxu34/DPGP |
Framework | none |
Neural Blind Deconvolution Using Deep Priors
Title | Neural Blind Deconvolution Using Deep Priors |
Authors | Dongwei Ren, Kai Zhang, Qilong Wang, Qinghua Hu, Wangmeng Zuo |
Abstract | Blind deconvolution is a classical yet challenging low-level vision problem with many real-world applications. Traditional maximum a posterior (MAP) based methods rely heavily on fixed and handcrafted priors that certainly are insufficient in characterizing clean images and blur kernels, and usually adopt specially designed alternating minimization to avoid trivial solution. In contrast, existing deep motion deblurring networks learn from massive training images the mapping to clean image or blur kernel, but are limited in handling various complex and large size blur kernels. To connect MAP and deep models, we in this paper present two generative networks for respectively modeling the deep priors of clean image and blur kernel, and propose an unconstrained neural optimization solution to blind deconvolution. In particular, we adopt an asymmetric Autoencoder with skip connections for generating latent clean image, and a fully-connected network (FCN) for generating blur kernel. Moreover, the SoftMax nonlinearity is applied to the output layer of FCN to meet the non-negative and equality constraints. The process of neural optimization can be explained as a kind of “zero-shot” self-supervised learning of the generative networks, and thus our proposed method is dubbed SelfDeblur. Experimental results show that our SelfDeblur can achieve notable quantitative gains as well as more visually plausible deblurring results in comparison to state-of-the-art blind deconvolution methods on benchmark datasets and real-world blurry images. The source code is available at https://github.com/csdwren/SelfDeblur |
Tasks | Deblurring |
Published | 2019-08-06 |
URL | https://arxiv.org/abs/1908.02197v2 |
https://arxiv.org/pdf/1908.02197v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-blind-deconvolution-using-deep-priors |
Repo | https://github.com/csdwren/SelfDeblur |
Framework | pytorch |
Can neural networks understand monotonicity reasoning?
Title | Can neural networks understand monotonicity reasoning? |
Authors | Hitomi Yanaka, Koji Mineshima, Daisuke Bekki, Kentaro Inui, Satoshi Sekine, Lasha Abzianidze, Johan Bos |
Abstract | Monotonicity reasoning is one of the important reasoning skills for any intelligent natural language inference (NLI) model in that it requires the ability to capture the interaction between lexical and syntactic structures. Since no test set has been developed for monotonicity reasoning with wide coverage, it is still unclear whether neural models can perform monotonicity reasoning in a proper way. To investigate this issue, we introduce the Monotonicity Entailment Dataset (MED). Performance by state-of-the-art NLI models on the new test set is substantially worse, under 55%, especially on downward reasoning. In addition, analysis using a monotonicity-driven data augmentation method showed that these models might be limited in their generalization ability in upward and downward reasoning. |
Tasks | Data Augmentation, Natural Language Inference |
Published | 2019-06-15 |
URL | https://arxiv.org/abs/1906.06448v2 |
https://arxiv.org/pdf/1906.06448v2.pdf | |
PWC | https://paperswithcode.com/paper/can-neural-networks-understand-monotonicity |
Repo | https://github.com/verypluming/MED |
Framework | none |
Bad Global Minima Exist and SGD Can Reach Them
Title | Bad Global Minima Exist and SGD Can Reach Them |
Authors | Shengchao Liu, Dimitris Papailiopoulos, Dimitris Achlioptas |
Abstract | Several recent works have aimed to explain why severely overparameterized models, generalize well when trained by Stochastic Gradient Descent (SGD). The emergent consensus explanation has two parts: the first is that there are “no bad local minima”, while the second is that SGD performs implicit regularization by having a bias towards low complexity models. We revisit both of these ideas in the context of image classification with common deep neural network architectures. Our first finding is that there exist bad global minima, i.e., models that fit the training set perfectly, yet have poor generalization. Our second finding is that given only unlabeled training data, we can easily construct initializations that will cause SGD to quickly converge to such bad global minima. For example, on CIFAR, CINIC10, and (Restricted) ImageNet, this can be achieved by starting SGD at a model derived by fitting random labels on the training data: while subsequent SGD training (with the correct labels) will reach zero training error, the resulting model will exhibit a test accuracy degradation of up to 40% compared to training from a random initialization. Finally, we show that regularization seems to provide SGD with an escape route: once heuristics such as data augmentation are used, starting from a complex model (adversarial initialization) has no effect on the test accuracy. |
Tasks | Data Augmentation, Image Classification |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.02613v1 |
https://arxiv.org/pdf/1906.02613v1.pdf | |
PWC | https://paperswithcode.com/paper/bad-global-minima-exist-and-sgd-can-reach |
Repo | https://github.com/chao1224/BadGlobalMinima |
Framework | pytorch |
Grid Saliency for Context Explanations of Semantic Segmentation
Title | Grid Saliency for Context Explanations of Semantic Segmentation |
Authors | Lukas Hoyer, Mauricio Munoz, Prateek Katiyar, Anna Khoreva, Volker Fischer |
Abstract | Recently, there has been a growing interest in developing saliency methods that provide visual explanations of network predictions. Still, the usability of existing methods is limited to image classification models. To overcome this limitation, we extend the existing approaches to generate grid saliencies, which provide spatially coherent visual explanations for (pixel-level) dense prediction networks. As the proposed grid saliency allows to spatially disentangle the object and its context, we specifically explore its potential to produce context explanations for semantic segmentation networks, discovering which context most influences the class predictions inside a target object area. We investigate the effectiveness of grid saliency on a synthetic dataset with an artificially induced bias between objects and their context as well as on the real-world Cityscapes dataset using state-of-the-art segmentation networks. Our results show that grid saliency can be successfully used to provide easily interpretable context explanations and, moreover, can be employed for detecting and localizing contextual biases present in the data. |
Tasks | Image Classification, Semantic Segmentation |
Published | 2019-07-30 |
URL | https://arxiv.org/abs/1907.13054v2 |
https://arxiv.org/pdf/1907.13054v2.pdf | |
PWC | https://paperswithcode.com/paper/grid-saliency-for-context-explanations-of |
Repo | https://github.com/boschresearch/GridSaliency-ToyDatasetGen |
Framework | none |
Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image
Title | Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image |
Authors | Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee |
Abstract | Although significant improvement has been achieved recently in 3D human pose estimation, most of the previous methods only treat a single-person case. In this work, we firstly propose a fully learning-based, camera distance-aware top-down approach for 3D multi-person pose estimation from a single RGB image. The pipeline of the proposed system consists of human detection, absolute 3D human root localization, and root-relative 3D single-person pose estimation modules. Our system achieves comparable results with the state-of-the-art 3D single-person pose estimation models without any groundtruth information and significantly outperforms previous 3D multi-person pose estimation methods on publicly available datasets. The code is available in https://github.com/mks0601/3DMPPE_ROOTNET_RELEASE , https://github.com/mks0601/3DMPPE_POSENET_RELEASE. |
Tasks | 3D Absolute Human Pose Estimation, 3D Human Pose Estimation, 3D Multi-person Pose Estimation, 3D Multi-person Pose Estimation (absolute), 3D Multi-person Pose Estimation (root-relative), Multi-Person Pose Estimation, Pose Estimation |
Published | 2019-07-26 |
URL | https://arxiv.org/abs/1907.11346v2 |
https://arxiv.org/pdf/1907.11346v2.pdf | |
PWC | https://paperswithcode.com/paper/camera-distance-aware-top-down-approach-for |
Repo | https://github.com/mks0601/3DMPPE_ROOTNET_RELEASE |
Framework | pytorch |
ChainerRL: A Deep Reinforcement Learning Library
Title | ChainerRL: A Deep Reinforcement Learning Library |
Authors | Yasuhiro Fujita, Toshiki Kataoka, Prabhat Nagarajan, Takahiro Ishikawa |
Abstract | In this paper, we introduce ChainerRL, an open-source Deep Reinforcement Learning (DRL) library built using Python and the Chainer deep learning framework. ChainerRL implements a comprehensive set of DRL algorithms and techniques drawn from the state-of-the-art research in the field. To foster reproducible research, and for instructional purposes, ChainerRL provides scripts that closely replicate the original papers’ experimental settings and reproduce published benchmark results for several algorithms. Lastly, ChainerRL offers a visualization tool that enables the qualitative inspection of trained agents. The ChainerRL source code can be found on GitHub: https://github.com/chainer/chainerrl . |
Tasks | |
Published | 2019-12-09 |
URL | https://arxiv.org/abs/1912.03905v1 |
https://arxiv.org/pdf/1912.03905v1.pdf | |
PWC | https://paperswithcode.com/paper/chainerrl-a-deep-reinforcement-learning |
Repo | https://github.com/chainer/chainerrl |
Framework | none |
Alternative Weighting Schemes for ELMo Embeddings
Title | Alternative Weighting Schemes for ELMo Embeddings |
Authors | Nils Reimers, Iryna Gurevych |
Abstract | ELMo embeddings (Peters et. al, 2018) had a huge impact on the NLP community and may recent publications use these embeddings to boost the performance for downstream NLP tasks. However, integration of ELMo embeddings in existent NLP architectures is not straightforward. In contrast to traditional word embeddings, like GloVe or word2vec embeddings, the bi-directional language model of ELMo produces three 1024 dimensional vectors per token in a sentence. Peters et al. proposed to learn a task-specific weighting of these three vectors for downstream tasks. However, this proposed weighting scheme is not feasible for certain tasks, and, as we will show, it does not necessarily yield optimal performance. We evaluate different methods that combine the three vectors from the language model in order to achieve the best possible performance in downstream NLP tasks. We notice that the third layer of the published language model often decreases the performance. By learning a weighted average of only the first two layers, we are able to improve the performance for many datasets. Due to the reduced complexity of the language model, we have a training speed-up of 19-44% for the downstream task. |
Tasks | Language Modelling, Word Embeddings |
Published | 2019-04-05 |
URL | http://arxiv.org/abs/1904.02954v1 |
http://arxiv.org/pdf/1904.02954v1.pdf | |
PWC | https://paperswithcode.com/paper/alternative-weighting-schemes-for-elmo |
Repo | https://github.com/UKPLab/elmo-bilstm-cnn-crf |
Framework | tf |
CLEARumor at SemEval-2019 Task 7: ConvoLving ELMo Against Rumors
Title | CLEARumor at SemEval-2019 Task 7: ConvoLving ELMo Against Rumors |
Authors | Ipek Baris, Lukas Schmelzeisen, Steffen Staab |
Abstract | This paper describes our submission to SemEval-2019 Task 7: RumourEval: Determining Rumor Veracity and Support for Rumors. We participated in both subtasks. The goal of subtask A is to classify the type of interaction between a rumorous social media post and a reply post as support, query, deny, or comment. The goal of subtask B is to predict the veracity of a given rumor. For subtask A, we implement a CNN-based neural architecture using ELMo embeddings of post text combined with auxiliary features and achieve a F1-score of 44.6%. For subtask B, we employ a MLP neural network leveraging our estimates for subtask A and achieve a F1-score of 30.1% (second place in the competition). We provide results and analysis of our system performance and present ablation experiments. |
Tasks | Rumour Detection |
Published | 2019-04-05 |
URL | http://arxiv.org/abs/1904.03084v1 |
http://arxiv.org/pdf/1904.03084v1.pdf | |
PWC | https://paperswithcode.com/paper/clearumor-at-semeval-2019-task-7-convolving |
Repo | https://github.com/Institute-Web-Science-and-Technologies/CLEARumor |
Framework | pytorch |
Fine-Grained Named Entity Recognition using ELMo and Wikidata
Title | Fine-Grained Named Entity Recognition using ELMo and Wikidata |
Authors | Cihan Dogan, Aimore Dutra, Adam Gara, Alfredo Gemma, Lei Shi, Michael Sigamani, Ella Walters |
Abstract | Fine-grained Named Entity Recognition is a task whereby we detect and classify entity mentions to a large set of types. These types can span diverse domains such as finance, healthcare, and politics. We observe that when the type set spans several domains the accuracy of the entity detection becomes a limitation for supervised learning models. The primary reason being the lack of datasets where entity boundaries are properly annotated, whilst covering a large spectrum of entity types. Furthermore, many named entity systems suffer when considering the categorization of fine grained entity types. Our work attempts to address these issues, in part, by combining state-of-the-art deep learning models (ELMo) with an expansive knowledge base (Wikidata). Using our framework, we cross-validate our model on the 112 fine-grained entity types based on the hierarchy given from the Wiki(gold) dataset. |
Tasks | Named Entity Recognition |
Published | 2019-04-23 |
URL | http://arxiv.org/abs/1904.10503v1 |
http://arxiv.org/pdf/1904.10503v1.pdf | |
PWC | https://paperswithcode.com/paper/fine-grained-named-entity-recognition-using |
Repo | https://github.com/sigamani/ner |
Framework | none |