Paper Group ANR 665
Global Weighted Average Pooling Bridges Pixel-level Localization and Image-level Classification. Deep learning systems as complex networks. The Goldilocks zone: Towards better understanding of neural network loss landscapes. A Novel Technique for Evidence based Conditional Inference in Deep Neural Networks via Latent Feature Perturbation. RiTUAL-UH …
Global Weighted Average Pooling Bridges Pixel-level Localization and Image-level Classification
Title | Global Weighted Average Pooling Bridges Pixel-level Localization and Image-level Classification |
Authors | Suo Qiu |
Abstract | In this work, we first tackle the problem of simultaneous pixel-level localization and image-level classification with only image-level labels for fully convolutional network training. We investigate the global pooling method which plays a vital role in this task. Classical global max pooling and average pooling methods are hard to indicate the precise regions of objects. Therefore, we revisit the global weighted average pooling (GWAP) method for this task and propose the class-agnostic GWAP module and the class-specific GWAP module in this paper. We evaluate the classification and pixel-level localization ability on the ILSVRC benchmark dataset. Experimental results show that the proposed GWAP module can better capture the regions of the foreground objects. We further explore the knowledge transfer between the image classification task and the region-based object detection task. We propose a multi-task framework that combines our class-specific GWAP module with R-FCN. The framework is trained with few ground truth bounding boxes and large-scale image-level labels. We evaluate this framework on PASCAL VOC dataset. Experimental results show that this framework can use the data with only image-level labels to improve the generalization of the object detection model. |
Tasks | Image Classification, Object Detection, Transfer Learning |
Published | 2018-09-21 |
URL | http://arxiv.org/abs/1809.08264v1 |
http://arxiv.org/pdf/1809.08264v1.pdf | |
PWC | https://paperswithcode.com/paper/global-weighted-average-pooling-bridges-pixel |
Repo | |
Framework | |
Deep learning systems as complex networks
Title | Deep learning systems as complex networks |
Authors | Alberto Testolin, Michele Piccolini, Samir Suweis |
Abstract | Thanks to the availability of large scale digital datasets and massive amounts of computational power, deep learning algorithms can learn representations of data by exploiting multiple levels of abstraction. These machine learning methods have greatly improved the state-of-the-art in many challenging cognitive tasks, such as visual object recognition, speech processing, natural language understanding and automatic translation. In particular, one class of deep learning models, known as deep belief networks, can discover intricate statistical structure in large data sets in a completely unsupervised fashion, by learning a generative model of the data using Hebbian-like learning mechanisms. Although these self-organizing systems can be conveniently formalized within the framework of statistical mechanics, their internal functioning remains opaque, because their emergent dynamics cannot be solved analytically. In this article we propose to study deep belief networks using techniques commonly employed in the study of complex networks, in order to gain some insights into the structural and functional properties of the computational graph resulting from the learning process. |
Tasks | Object Recognition |
Published | 2018-09-28 |
URL | http://arxiv.org/abs/1809.10941v1 |
http://arxiv.org/pdf/1809.10941v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-systems-as-complex-networks |
Repo | |
Framework | |
The Goldilocks zone: Towards better understanding of neural network loss landscapes
Title | The Goldilocks zone: Towards better understanding of neural network loss landscapes |
Authors | Stanislav Fort, Adam Scherlis |
Abstract | We explore the loss landscape of fully-connected and convolutional neural networks using random, low-dimensional hyperplanes and hyperspheres. Evaluating the Hessian, $H$, of the loss function on these hypersurfaces, we observe 1) an unusual excess of the number of positive eigenvalues of $H$, and 2) a large value of $\mathrm{Tr}(H) / H$ at a well defined range of configuration space radii, corresponding to a thick, hollow, spherical shell we refer to as the \textit{Goldilocks zone}. We observe this effect for fully-connected neural networks over a range of network widths and depths on MNIST and CIFAR-10 datasets with the $\mathrm{ReLU}$ and $\tanh$ non-linearities, and a similar effect for convolutional networks. Using our observations, we demonstrate a close connection between the Goldilocks zone, measures of local convexity/prevalence of positive curvature, and the suitability of a network initialization. We show that the high and stable accuracy reached when optimizing on random, low-dimensional hypersurfaces is directly related to the overlap between the hypersurface and the Goldilocks zone, and as a corollary demonstrate that the notion of intrinsic dimension is initialization-dependent. We note that common initialization techniques initialize neural networks in this particular region of unusually high convexity/prevalence of positive curvature, and offer a geometric intuition for their success. Furthermore, we demonstrate that initializing a neural network at a number of points and selecting for high measures of local convexity such as $\mathrm{Tr}(H) / H$, number of positive eigenvalues of $H$, or low initial loss, leads to statistically significantly faster training on MNIST. Based on our observations, we hypothesize that the Goldilocks zone contains an unusually high density of suitable initialization configurations. |
Tasks | |
Published | 2018-07-06 |
URL | http://arxiv.org/abs/1807.02581v2 |
http://arxiv.org/pdf/1807.02581v2.pdf | |
PWC | https://paperswithcode.com/paper/the-goldilocks-zone-towards-better |
Repo | |
Framework | |
A Novel Technique for Evidence based Conditional Inference in Deep Neural Networks via Latent Feature Perturbation
Title | A Novel Technique for Evidence based Conditional Inference in Deep Neural Networks via Latent Feature Perturbation |
Authors | Dinesh Khandelwal, Suyash Agrawal, Parag Singla, Chetan Arora |
Abstract | Auxiliary information can be exploited in machine learning models using the paradigm of evidence based conditional inference. Multi-modal techniques in Deep Neural Networks (DNNs) can be seen as perturbing the latent feature representation for incorporating evidence from the auxiliary modality. However, they require training a specialized network which can map sparse evidence to a high dimensional latent space vector. Designing such a network, as well as collecting jointly labeled data for training is a non-trivial task. In this paper, we present a novel multi-task learning (MTL) based framework to perform evidence based conditional inference in DNNs which can overcome both these shortcomings. Our framework incorporates evidence as the output of secondary task(s), while modeling the original problem as the primary task of interest. During inference, we employ a novel Bayesian formulation to change the joint latent feature representation so as to maximize the probability of the observed evidence. Since our approach models evidence as prediction from a DNN, this can often be achieved using standard pre-trained backbones for popular tasks, eliminating the need for training altogether. Even when training is required, our MTL architecture ensures the same can be done without any need for jointly labeled data. Exploiting evidence using our framework, we show an improvement of 3.9% over the state-of-the-art, for predicting semantic segmentation given the image tags, and 2.8% for predicting instance segmentation given image captions. |
Tasks | Image Captioning, Instance Segmentation, Multi-Task Learning, Semantic Segmentation, Video Summarization |
Published | 2018-11-24 |
URL | https://arxiv.org/abs/1811.09796v6 |
https://arxiv.org/pdf/1811.09796v6.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-test-time-evidence-to-improve |
Repo | |
Framework | |
RiTUAL-UH at TRAC 2018 Shared Task: Aggression Identification
Title | RiTUAL-UH at TRAC 2018 Shared Task: Aggression Identification |
Authors | Niloofar Safi Samghabadi, Deepthi Mave, Sudipta Kar, Thamar Solorio |
Abstract | This paper presents our system for “TRAC 2018 Shared Task on Aggression Identification”. Our best systems for the English dataset use a combination of lexical and semantic features. However, for Hindi data using only lexical features gave us the best results. We obtained weighted F1- measures of 0.5921 for the English Facebook task (ranked 12th), 0.5663 for the English Social Media task (ranked 6th), 0.6292 for the Hindi Facebook task (ranked 1st), and 0.4853 for the Hindi Social Media task (ranked 2nd). |
Tasks | |
Published | 2018-07-31 |
URL | http://arxiv.org/abs/1807.11712v1 |
http://arxiv.org/pdf/1807.11712v1.pdf | |
PWC | https://paperswithcode.com/paper/ritual-uh-at-trac-2018-shared-task-aggression |
Repo | |
Framework | |
A Benchmark for Breast Ultrasound Image Segmentation (BUSIS)
Title | A Benchmark for Breast Ultrasound Image Segmentation (BUSIS) |
Authors | Min Xian, Yingtao Zhang, H. D. Cheng, Fei Xu, Kuan Huang, Boyu Zhang, Jianrui Ding, Chunping Ning, Ying Wang |
Abstract | Breast ultrasound (BUS) image segmentation is challenging and critical for BUS Computer-Aided Diagnosis (CAD) systems. Many BUS segmentation approaches have been proposed in the last two decades, but the performances of most approaches have been assessed using relatively small private datasets with differ-ent quantitative metrics, which result in discrepancy in performance comparison. Therefore, there is a pressing need for building a benchmark to compare existing methods using a public dataset objectively, and to determine the performance of the best breast tumor segmentation algorithm available today and to investigate what segmentation strategies are valuable in clinical practice and theoretical study. In this work, we will publish a B-mode BUS image segmentation benchmark (BUSIS) with 562 images and compare the performance of five state-of-the-art BUS segmentation methods quantitatively. |
Tasks | Semantic Segmentation |
Published | 2018-01-09 |
URL | http://arxiv.org/abs/1801.03182v1 |
http://arxiv.org/pdf/1801.03182v1.pdf | |
PWC | https://paperswithcode.com/paper/a-benchmark-for-breast-ultrasound-image |
Repo | |
Framework | |
Neural Joking Machine : Humorous image captioning
Title | Neural Joking Machine : Humorous image captioning |
Authors | Kota Yoshida, Munetaka Minoguchi, Kenichiro Wani, Akio Nakamura, Hirokatsu Kataoka |
Abstract | What is an effective expression that draws laughter from human beings? In the present paper, in order to consider this question from an academic standpoint, we generate an image caption that draws a “laugh” by a computer. A system that outputs funny captions based on the image caption proposed in the computer vision field is constructed. Moreover, we also propose the Funny Score, which flexibly gives weights according to an evaluation database. The Funny Score more effectively brings out “laughter” to optimize a model. In addition, we build a self-collected BoketeDB, which contains a theme (image) and funny caption (text) posted on “Bokete”, which is an image Ogiri website. In an experiment, we use BoketeDB to verify the effectiveness of the proposed method by comparing the results obtained using the proposed method and those obtained using MS COCO Pre-trained CNN+LSTM, which is the baseline and idiot created by humans. We refer to the proposed method, which uses the BoketeDB pre-trained model, as the Neural Joking Machine (NJM). |
Tasks | Image Captioning |
Published | 2018-05-30 |
URL | http://arxiv.org/abs/1805.11850v1 |
http://arxiv.org/pdf/1805.11850v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-joking-machine-humorous-image |
Repo | |
Framework | |
Clinically Deployed Distributed Magnetic Resonance Imaging Reconstruction: Application to Pediatric Knee Imaging
Title | Clinically Deployed Distributed Magnetic Resonance Imaging Reconstruction: Application to Pediatric Knee Imaging |
Authors | Michael J. Anderson, Jonathan I. Tamir, Javier S. Turek, Marcus T. Alley, Theodore L. Willke, Shreyas S. Vasanawala, Michael Lustig |
Abstract | Magnetic resonance imaging is capable of producing volumetric images without ionizing radiation. Nonetheless, long acquisitions lead to prohibitively long exams. Compressed sensing (CS) can enable faster scanning via sub-sampling with reduced artifacts. However, CS requires significantly higher reconstruction computation, limiting current clinical applications to 2D/3D or limited-resolution dynamic imaging. Here we analyze the practical limitations to T2 Shuffling, a four-dimensional CS-based acquisition, which provides sharp 3D-isotropic-resolution and multi-contrast images in a single scan. Our improvements to the pipeline on a single machine provide a 3x overall reconstruction speedup, which allowed us to add algorithmic changes improving image quality. Using four machines, we achieved additional 2.1x improvement through distributed parallelization. Our solution reduced the reconstruction time in the hospital to 90 seconds on a 4-node cluster, enabling its use clinically. To understand the implications of scaling this application, we simulated running our reconstructions with a multiple scanner setup typical in hospitals. |
Tasks | |
Published | 2018-09-11 |
URL | http://arxiv.org/abs/1809.04195v1 |
http://arxiv.org/pdf/1809.04195v1.pdf | |
PWC | https://paperswithcode.com/paper/clinically-deployed-distributed-magnetic |
Repo | |
Framework | |
Kitting in the Wild through Online Domain Adaptation
Title | Kitting in the Wild through Online Domain Adaptation |
Authors | Massimiliano Mancini, Hakan Karaoguz, Elisa Ricci, Patric Jensfelt, Barbara Caputo |
Abstract | Technological developments call for increasing perception and action capabilities of robots. Among other skills, vision systems that can adapt to any possible change in the working conditions are needed. Since these conditions are unpredictable, we need benchmarks which allow to assess the generalization and robustness capabilities of our visual recognition algorithms. In this work we focus on robotic kitting in unconstrained scenarios. As a first contribution, we present a new visual dataset for the kitting task. Differently from standard object recognition datasets, we provide images of the same objects acquired under various conditions where camera, illumination and background are changed. This novel dataset allows for testing the robustness of robot visual recognition algorithms to a series of different domain shifts both in isolation and unified. Our second contribution is a novel online adaptation algorithm for deep models, based on batch-normalization layers, which allows to continuously adapt a model to the current working conditions. Differently from standard domain adaptation algorithms, it does not require any image from the target domain at training time. We benchmark the performance of the algorithm on the proposed dataset, showing its capability to fill the gap between the performances of a standard architecture and its counterpart adapted offline to the given target domain. |
Tasks | Domain Adaptation, Object Recognition |
Published | 2018-07-03 |
URL | http://arxiv.org/abs/1807.01028v1 |
http://arxiv.org/pdf/1807.01028v1.pdf | |
PWC | https://paperswithcode.com/paper/kitting-in-the-wild-through-online-domain |
Repo | |
Framework | |
Curvature of Hypergraphs via Multi-Marginal Optimal Transport
Title | Curvature of Hypergraphs via Multi-Marginal Optimal Transport |
Authors | Shahab Asoodeh, Tingran Gao, James Evans |
Abstract | We introduce a novel definition of curvature for hypergraphs, a natural generalization of graphs, by introducing a multi-marginal optimal transport problem for a naturally defined random walk on the hypergraph. This curvature, termed \emph{coarse scalar curvature}, generalizes a recent definition of Ricci curvature for Markov chains on metric spaces by Ollivier [Journal of Functional Analysis 256 (2009) 810-864], and is related to the scalar curvature when the hypergraph arises naturally from a Riemannian manifold. We investigate basic properties of the coarse scalar curvature and obtain several bounds. Empirical experiments indicate that coarse scalar curvatures are capable of detecting “bridges” across connected components in hypergraphs, suggesting it is an appropriate generalization of curvature on simple graphs. |
Tasks | |
Published | 2018-03-22 |
URL | http://arxiv.org/abs/1803.08584v1 |
http://arxiv.org/pdf/1803.08584v1.pdf | |
PWC | https://paperswithcode.com/paper/curvature-of-hypergraphs-via-multi-marginal |
Repo | |
Framework | |
Convergence of Learning Dynamics in Information Retrieval Games
Title | Convergence of Learning Dynamics in Information Retrieval Games |
Authors | Omer Ben-Porat, Itay Rosenberg, Moshe Tennenholtz |
Abstract | We consider a game-theoretic model of information retrieval with strategic authors. We examine two different utility schemes: authors who aim at maximizing exposure and authors who want to maximize active selection of their content (i.e. the number of clicks). We introduce the study of author learning dynamics in such contexts. We prove that under the probability ranking principle (PRP), which forms the basis of the current state of the art ranking methods, any better-response learning dynamics converges to a pure Nash equilibrium. We also show that other ranking methods induce a strategic environment under which such a convergence may not occur. |
Tasks | Information Retrieval |
Published | 2018-06-14 |
URL | http://arxiv.org/abs/1806.05359v3 |
http://arxiv.org/pdf/1806.05359v3.pdf | |
PWC | https://paperswithcode.com/paper/convergence-of-learning-dynamics-in |
Repo | |
Framework | |
Machine Learning Interpretability: A Science rather than a tool
Title | Machine Learning Interpretability: A Science rather than a tool |
Authors | Abdul Karim, Avinash Mishra, MA Hakim Newton, Abdul Sattar |
Abstract | The term “interpretability” is oftenly used by machine learning researchers each with their own intuitive understanding of it. There is no universal well agreed upon definition of interpretability in machine learning. As any type of science discipline is mainly driven by the set of formulated questions rather than by different tools in that discipline, e.g. astrophysics is the discipline that learns the composition of stars, not as the discipline that use the spectroscopes. Similarly, we propose that machine learning interpretability should be a discipline that answers specific questions related to interpretability. These questions can be of statistical, causal and counterfactual nature. Therefore, there is a need to look into the interpretability problem of machine learning in the context of questions that need to be addressed rather than different tools. We discuss about a hypothetical interpretability framework driven by a question based scientific approach rather than some specific machine learning model. Using a question based notion of interpretability, we can step towards understanding the science of machine learning rather than its engineering. This notion will also help us understanding any specific problem more in depth rather than relying solely on machine learning methods. |
Tasks | |
Published | 2018-07-18 |
URL | http://arxiv.org/abs/1807.06722v2 |
http://arxiv.org/pdf/1807.06722v2.pdf | |
PWC | https://paperswithcode.com/paper/machine-learning-interpretability-a-science |
Repo | |
Framework | |
Aggregation of binary feature descriptors for compact scene model representation in large scale structure-from-motion applications
Title | Aggregation of binary feature descriptors for compact scene model representation in large scale structure-from-motion applications |
Authors | Jacek Komorowski, Tomasz Trzcinski |
Abstract | In this paper we present an efficient method for aggregating binary feature descriptors to allow compact representation of 3D scene model in incremental structure-from-motion and SLAM applications. All feature descriptors linked with one 3D scene point or landmark are represented by a single low-dimensional real-valued vector called a \emph{prototype}. The method allows significant reduction of memory required to store and process feature descriptors in large-scale structure-from-motion applications. An efficient approximate nearest neighbours search methods suited for real-valued descriptors, such as FLANN, can be used on the resulting prototypes to speed up matching processed frames. |
Tasks | |
Published | 2018-09-28 |
URL | http://arxiv.org/abs/1809.11062v1 |
http://arxiv.org/pdf/1809.11062v1.pdf | |
PWC | https://paperswithcode.com/paper/aggregation-of-binary-feature-descriptors-for |
Repo | |
Framework | |
Perceiving Physical Equation by Observing Visual Scenarios
Title | Perceiving Physical Equation by Observing Visual Scenarios |
Authors | Siyu Huang, Zhi-Qi Cheng, Xi Li, Xiao Wu, Zhongfei Zhang, Alexander Hauptmann |
Abstract | Inferring universal laws of the environment is an important ability of human intelligence as well as a symbol of general AI. In this paper, we take a step toward this goal such that we introduce a new challenging problem of inferring invariant physical equation from visual scenarios. For instance, teaching a machine to automatically derive the gravitational acceleration formula by watching a free-falling object. To tackle this challenge, we present a novel pipeline comprised of an Observer Engine and a Physicist Engine by respectively imitating the actions of an observer and a physicist in the real world. Generally, the Observer Engine watches the visual scenarios and then extracting the physical properties of objects. The Physicist Engine analyses these data and then summarizing the inherent laws of object dynamics. Specifically, the learned laws are expressed by mathematical equations such that they are more interpretable than the results given by common probabilistic models. Experiments on synthetic videos have shown that our pipeline is able to discover physical equations on various physical worlds with different visual appearances. |
Tasks | |
Published | 2018-11-29 |
URL | http://arxiv.org/abs/1811.12238v1 |
http://arxiv.org/pdf/1811.12238v1.pdf | |
PWC | https://paperswithcode.com/paper/perceiving-physical-equation-by-observing |
Repo | |
Framework | |
Neural networks versus Logistic regression for 30 days all-cause readmission prediction
Title | Neural networks versus Logistic regression for 30 days all-cause readmission prediction |
Authors | Ahmed Allam, Mate Nagy, George Thoma, Michael Krauthammer |
Abstract | Heart failure (HF) is one of the leading causes of hospital admissions in the US. Readmission within 30 days after a HF hospitalization is both a recognized indicator for disease progression and a source of considerable financial burden to the healthcare system. Consequently, the identification of patients at risk for readmission is a key step in improving disease management and patient outcome. In this work, we used a large administrative claims dataset to (1)explore the systematic application of neural network-based models versus logistic regression for predicting 30 days all-cause readmission after discharge from a HF admission, and (2)to examine the additive value of patients’ hospitalization timelines on prediction performance. Based on data from 272,778 (49% female) patients with a mean (SD) age of 73 years (14) and 343,328 HF admissions (67% of total admissions), we trained and tested our predictive readmission models following a stratified 5-fold cross-validation scheme. Among the deep learning approaches, a recurrent neural network (RNN) combined with conditional random fields (CRF) model (RNNCRF) achieved the best performance in readmission prediction with 0.642 AUC (95% CI, 0.640-0.645). Other models, such as those based on RNN, convolutional neural networks and CRF alone had lower performance, with a non-timeline based model (MLP) performing worst. A competitive model based on logistic regression with LASSO achieved a performance of 0.643 AUC (95%CI, 0.640-0.646). We conclude that data from patient timelines improve 30 day readmission prediction for neural network-based models, that a logistic regression with LASSO has equal performance to the best neural network model and that the use of administrative data result in competitive performance compared to published approaches based on richer clinical datasets. |
Tasks | Readmission Prediction |
Published | 2018-12-22 |
URL | http://arxiv.org/abs/1812.09549v1 |
http://arxiv.org/pdf/1812.09549v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-networks-versus-logistic-regression |
Repo | |
Framework | |