Paper Group ANR 945
Conceptual Content in Deep Convolutional Neural Networks: An analysis into multi-faceted properties of neurons. On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems. Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced. Predicting Distresses using Deep Learning of Text Segments in Annual Re …
Conceptual Content in Deep Convolutional Neural Networks: An analysis into multi-faceted properties of neurons
Title | Conceptual Content in Deep Convolutional Neural Networks: An analysis into multi-faceted properties of neurons |
Authors | Zahra Sadeghi |
Abstract | In this paper we analyze convolutional layers of VGG16 model pre-trained on ILSVRC2012. We based our analysis on the responses of neurons to the images of all classes in ImageNet database. In our analysis, we first propose a visualization method to illustrate the learned content of each neuron. Next, we investigate single and multi-faceted neurons based on the diversity of neurons responses to different classes. Finally, we compute the neuronal similarity at each layer and make a comparison between them. Our results demonstrate that the neurons in lower layers exhibit a multi-faceted behavior, whereas the majority of neurons in higher layers comprise single-faceted property and tend to respond to a smaller number of classes. |
Tasks | |
Published | 2018-10-31 |
URL | https://arxiv.org/abs/1811.00161v2 |
https://arxiv.org/pdf/1811.00161v2.pdf | |
PWC | https://paperswithcode.com/paper/conceptual-content-in-deep-convolutional |
Repo | |
Framework | |
On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems
Title | On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems |
Authors | Lai Wei, Vaibhav Srivastava |
Abstract | We study the non-stationary stochastic multiarmed bandit (MAB) problem and propose two generic algorithms, namely, the limited memory deterministic sequencing of exploration and exploitation (LM-DSEE) and the Sliding-Window Upper Confidence Bound# (SW-UCB#). We rigorously analyze these algorithms in abruptly-changing and slowly-varying environments and characterize their performance. We show that the expected cumulative regret for these algorithms under either of the environments is upper bounded by sublinear functions of time, i.e., the time average of the regret asymptotically converges to zero. We complement our analytic results with numerical illustrations. |
Tasks | |
Published | 2018-02-23 |
URL | http://arxiv.org/abs/1802.08380v2 |
http://arxiv.org/pdf/1802.08380v2.pdf | |
PWC | https://paperswithcode.com/paper/on-abruptly-changing-and-slowly-varying |
Repo | |
Framework | |
Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced
Title | Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced |
Authors | Simon S. Du, Wei Hu, Jason D. Lee |
Abstract | We study the implicit regularization imposed by gradient descent for learning multi-layer homogeneous functions including feed-forward fully connected and convolutional deep neural networks with linear, ReLU or Leaky ReLU activation. We rigorously prove that gradient flow (i.e. gradient descent with infinitesimal step size) effectively enforces the differences between squared norms across different layers to remain invariant without any explicit regularization. This result implies that if the weights are initially small, gradient flow automatically balances the magnitudes of all layers. Using a discretization argument, we analyze gradient descent with positive step size for the non-convex low-rank asymmetric matrix factorization problem without any regularization. Inspired by our findings for gradient flow, we prove that gradient descent with step sizes $\eta_t = O\left(t^{-\left( \frac12+\delta\right)} \right)$ ($0<\delta\le\frac12$) automatically balances two low-rank factors and converges to a bounded global optimum. Furthermore, for rank-$1$ asymmetric matrix factorization we give a finer analysis showing gradient descent with constant step size converges to the global minimum at a globally linear rate. We believe that the idea of examining the invariance imposed by first order algorithms in learning homogeneous models could serve as a fundamental building block for studying optimization for learning deep models. |
Tasks | |
Published | 2018-06-04 |
URL | http://arxiv.org/abs/1806.00900v2 |
http://arxiv.org/pdf/1806.00900v2.pdf | |
PWC | https://paperswithcode.com/paper/algorithmic-regularization-in-learning-deep |
Repo | |
Framework | |
Predicting Distresses using Deep Learning of Text Segments in Annual Reports
Title | Predicting Distresses using Deep Learning of Text Segments in Annual Reports |
Authors | Rastin Matin, Casper Hansen, Christian Hansen, Pia Mølgaard |
Abstract | Corporate distress models typically only employ the numerical financial variables in the firms’ annual reports. We develop a model that employs the unstructured textual data in the reports as well, namely the auditors’ reports and managements’ statements. Our model consists of a convolutional recurrent neural network which, when concatenated with the numerical financial variables, learns a descriptive representation of the text that is suited for corporate distress prediction. We find that the unstructured data provides a statistically significant enhancement of the distress prediction performance, in particular for large firms where accurate predictions are of the utmost importance. Furthermore, we find that auditors’ reports are more informative than managements’ statements and that a joint model including both managements’ statements and auditors’ reports displays no enhancement relative to a model including only auditors’ reports. Our model demonstrates a direct improvement over existing state-of-the-art models. |
Tasks | |
Published | 2018-11-13 |
URL | http://arxiv.org/abs/1811.05270v1 |
http://arxiv.org/pdf/1811.05270v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-distresses-using-deep-learning-of |
Repo | |
Framework | |
Enhancing Decision Making Capacity in Tourism Domain Using Social Media Analytics
Title | Enhancing Decision Making Capacity in Tourism Domain Using Social Media Analytics |
Authors | Supun Abeysinghe, Isura Manchanayake, Chamod Samarajeewa, Prabod Rathnayaka, Malaka J. Walpola, Rashmika Nawaratne, Tharindu Bandaragoda, Damminda Alahakoon |
Abstract | Social media has gained an immense popularity over the last decade. People tend to express opinions about their daily encounters on social media freely. These daily encounters include the places they traveled, hotels or restaurants they have tried and aspects related to tourism in general. Since people usually express their true experiences on social media, the expressed opinions contain valuable information that can be used to generate business value and aid in decision-making processes. Due to the large volume of data, it is not a feasible task to manually go through each and every item and extract the information. Hence, we propose a social media analytics platform which has the capability to identify discussion pathways and aspects with their corresponding sentiment and deeper emotions using machine learning techniques and a visualization tool which shows the extracted insights in a comprehensible and concise manner. Identified topic pathways and aspects will give a decision maker some insight into what are the most discussed topics about the entity whereas associated sentiments and emotions will help to identify the feedback. |
Tasks | Decision Making |
Published | 2018-12-19 |
URL | http://arxiv.org/abs/1812.08330v1 |
http://arxiv.org/pdf/1812.08330v1.pdf | |
PWC | https://paperswithcode.com/paper/enhancing-decision-making-capacity-in-tourism |
Repo | |
Framework | |
Learning by Unsupervised Nonlinear Diffusion
Title | Learning by Unsupervised Nonlinear Diffusion |
Authors | Mauro Maggioni, James M. Murphy |
Abstract | This paper proposes and analyzes a novel clustering algorithm that combines graph-based diffusion geometry with techniques based on density and mode estimation. The proposed method is suitable for data generated from mixtures of distributions with densities that are both multimodal and have nonlinear shapes. A crucial aspect of this algorithm is the use of time of a data-adapted diffusion process as a scale parameter that is different from the local spatial scale parameter used in many clustering algorithms. We prove estimates for the behavior of diffusion distances with respect to this time parameter under a flexible nonparametric data model, identifying a range of times in which the mesoscopic equilibria of the underlying process are revealed, corresponding to a gap between within-cluster and between-cluster diffusion distances. These structures can be missed by the top eigenvectors of the graph Laplacian, commonly used in spectral clustering. This analysis is leveraged to prove sufficient conditions guaranteeing the accuracy of the proposed \emph{learning by unsupervised nonlinear diffusion (LUND)} procedure. We implement LUND and confirm its theoretical properties on illustrative datasets, demonstrating the theoretical and empirical advantages over both spectral clustering and density-based clustering techniques. |
Tasks | |
Published | 2018-10-15 |
URL | http://arxiv.org/abs/1810.06702v2 |
http://arxiv.org/pdf/1810.06702v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-by-unsupervised-nonlinear-diffusion |
Repo | |
Framework | |
A machine learning approach to reconstruction of heart surface potentials from body surface potentials
Title | A machine learning approach to reconstruction of heart surface potentials from body surface potentials |
Authors | Avinash Malik, Tommy Peng, Mark Trew |
Abstract | Invasive cardiac catheterisation is a common procedure that is carried out before surgical intervention. Yet, invasive cardiac diagnostics are full of risks, especially for young children. Decades of research has been conducted on the so called inverse problem of electrocardiography, which can be used to reconstruct Heart Surface Potentials (HSPs) from Body Surface Potentials (BSPs), for non-invasive diagnostics. State of the art solutions to the inverse problem are unsatisfactory, since the inverse problem is known to be ill-posed. In this paper we propose a novel approach to reconstructing HSPs from BSPs using a Time-Delay Artificial Neural Network (TDANN). We first design the TDANN architecture, and then develop an iterative search space algorithm to find the parameters of the TDANN, which results in the best overall HSP prediction. We use real-world recorded BSPs and HSPs from individuals suffering from serious cardiac conditions to validate our TDANN. The results are encouraging, in that coefficients obtained by correlating the predicted HSP with the recorded patient’ HSP approach ideal values. |
Tasks | |
Published | 2018-01-19 |
URL | http://arxiv.org/abs/1802.02240v1 |
http://arxiv.org/pdf/1802.02240v1.pdf | |
PWC | https://paperswithcode.com/paper/a-machine-learning-approach-to-reconstruction |
Repo | |
Framework | |
Light Propagation Prediction through Multimode Optical Fibers with a Deep Neural Network
Title | Light Propagation Prediction through Multimode Optical Fibers with a Deep Neural Network |
Authors | Pengfei Fan, Liang Deng, Lei Su |
Abstract | This work demonstrates a computational method for predicting the light propagation through a single multimode fiber using a deep neural network. The experiment for gathering training and testing data is performed with a digital micro-mirror device that enables the spatial light modulation. The modulated patterns on the device and the captured intensity-only images by the camera form the aligned data pairs. This sufficiently-trained deep neural network frame has very excellent performance for directly inferring the intensity-only output delivered though a multimode fiber. The model is validated by three standards: the mean squared error (MSE), the correlation coefficient (corr) and the structural similarity index (SSIM). |
Tasks | |
Published | 2018-12-06 |
URL | http://arxiv.org/abs/1812.02814v1 |
http://arxiv.org/pdf/1812.02814v1.pdf | |
PWC | https://paperswithcode.com/paper/light-propagation-prediction-through |
Repo | |
Framework | |
FutureMapping: The Computational Structure of Spatial AI Systems
Title | FutureMapping: The Computational Structure of Spatial AI Systems |
Authors | Andrew J. Davison |
Abstract | We discuss and predict the evolution of Simultaneous Localisation and Mapping (SLAM) into a general geometric and semantic `Spatial AI’ perception capability for intelligent embodied devices. A big gap remains between the visual perception performance that devices such as augmented reality eyewear or comsumer robots will require and what is possible within the constraints imposed by real products. Co-design of algorithms, processors and sensors will be needed. We explore the computational structure of current and future Spatial AI algorithms and consider this within the landscape of ongoing hardware developments. | |
Tasks | |
Published | 2018-03-29 |
URL | http://arxiv.org/abs/1803.11288v1 |
http://arxiv.org/pdf/1803.11288v1.pdf | |
PWC | https://paperswithcode.com/paper/futuremapping-the-computational-structure-of |
Repo | |
Framework | |
Multi-Cast Attention Networks for Retrieval-based Question Answering and Response Prediction
Title | Multi-Cast Attention Networks for Retrieval-based Question Answering and Response Prediction |
Authors | Yi Tay, Luu Anh Tuan, Siu Cheung Hui |
Abstract | Attention is typically used to select informative sub-phrases that are used for prediction. This paper investigates the novel use of attention as a form of feature augmentation, i.e, casted attention. We propose Multi-Cast Attention Networks (MCAN), a new attention mechanism and general model architecture for a potpourri of ranking tasks in the conversational modeling and question answering domains. Our approach performs a series of soft attention operations, each time casting a scalar feature upon the inner word embeddings. The key idea is to provide a real-valued hint (feature) to a subsequent encoder layer and is targeted at improving the representation learning process. There are several advantages to this design, e.g., it allows an arbitrary number of attention mechanisms to be casted, allowing for multiple attention types (e.g., co-attention, intra-attention) and attention variants (e.g., alignment-pooling, max-pooling, mean-pooling) to be executed simultaneously. This not only eliminates the costly need to tune the nature of the co-attention layer, but also provides greater extents of explainability to practitioners. Via extensive experiments on four well-known benchmark datasets, we show that MCAN achieves state-of-the-art performance. On the Ubuntu Dialogue Corpus, MCAN outperforms existing state-of-the-art models by $9%$. MCAN also achieves the best performing score to date on the well-studied TrecQA dataset. |
Tasks | Question Answering, Representation Learning, Word Embeddings |
Published | 2018-06-03 |
URL | http://arxiv.org/abs/1806.00778v1 |
http://arxiv.org/pdf/1806.00778v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-cast-attention-networks-for-retrieval |
Repo | |
Framework | |
Defending Against Machine Learning Model Stealing Attacks Using Deceptive Perturbations
Title | Defending Against Machine Learning Model Stealing Attacks Using Deceptive Perturbations |
Authors | Taesung Lee, Benjamin Edwards, Ian Molloy, Dong Su |
Abstract | Machine learning models are vulnerable to simple model stealing attacks if the adversary can obtain output labels for chosen inputs. To protect against these attacks, it has been proposed to limit the information provided to the adversary by omitting probability scores, significantly impacting the utility of the provided service. In this work, we illustrate how a service provider can still provide useful, albeit misleading, class probability information, while significantly limiting the success of the attack. Our defense forces the adversary to discard the class probabilities, requiring significantly more queries before they can train a model with comparable performance. We evaluate several attack strategies, model architectures, and hyperparameters under varying adversarial models, and evaluate the efficacy of our defense against the strongest adversary. Finally, we quantify the amount of noise injected into the class probabilities to mesure the loss in utility, e.g., adding 1.26 nats per query on CIFAR-10 and 3.27 on MNIST. Our evaluation shows our defense can degrade the accuracy of the stolen model at least 20%, or require up to 64 times more queries while keeping the accuracy of the protected model almost intact. |
Tasks | |
Published | 2018-05-31 |
URL | http://arxiv.org/abs/1806.00054v4 |
http://arxiv.org/pdf/1806.00054v4.pdf | |
PWC | https://paperswithcode.com/paper/defending-against-machine-learning-model |
Repo | |
Framework | |
Visual Font Pairing
Title | Visual Font Pairing |
Authors | Shuhui Jiang, Zhaowen Wang, Aaron Hertzmann, Hailin Jin, Yun Fu |
Abstract | This paper introduces the problem of automatic font pairing. Font pairing is an important design task that is difficult for novices. Given a font selection for one part of a document (e.g., header), our goal is to recommend a font to be used in another part (e.g., body) such that the two fonts used together look visually pleasing. There are three main challenges in font pairing. First, this is a fine-grained problem, in which the subtle distinctions between fonts may be important. Second, rules and conventions of font pairing given by human experts are difficult to formalize. Third, font pairing is an asymmetric problem in that the roles played by header and body fonts are not interchangeable. To address these challenges, we propose automatic font pairing through learning visual relationships from large-scale human-generated font pairs. We introduce a new database for font pairing constructed from millions of PDF documents available on the Internet. We propose two font pairing algorithms: dual-space k-NN and asymmetric similarity metric learning (ASML). These two methods automatically learn fine-grained relationships from large-scale data. We also investigate several baseline methods based on the rules from professional designers. Experiments and user studies demonstrate the effectiveness of our proposed dataset and methods. |
Tasks | Metric Learning |
Published | 2018-11-19 |
URL | http://arxiv.org/abs/1811.08015v1 |
http://arxiv.org/pdf/1811.08015v1.pdf | |
PWC | https://paperswithcode.com/paper/visual-font-pairing |
Repo | |
Framework | |
Flexible Neural Representation for Physics Prediction
Title | Flexible Neural Representation for Physics Prediction |
Authors | Damian Mrowca, Chengxu Zhuang, Elias Wang, Nick Haber, Li Fei-Fei, Joshua B. Tenenbaum, Daniel L. K. Yamins |
Abstract | Humans have a remarkable capacity to understand the physical dynamics of objects in their environment, flexibly capturing complex structures and interactions at multiple levels of detail. Inspired by this ability, we propose a hierarchical particle-based object representation that covers a wide variety of types of three-dimensional objects, including both arbitrary rigid geometrical shapes and deformable materials. We then describe the Hierarchical Relation Network (HRN), an end-to-end differentiable neural network based on hierarchical graph convolution, that learns to predict physical dynamics in this representation. Compared to other neural network baselines, the HRN accurately handles complex collisions and nonrigid deformations, generating plausible dynamics predictions at long time scales in novel settings, and scaling to large scene configurations. These results demonstrate an architecture with the potential to form the basis of next-generation physics predictors for use in computer vision, robotics, and quantitative cognitive science. |
Tasks | |
Published | 2018-06-21 |
URL | http://arxiv.org/abs/1806.08047v2 |
http://arxiv.org/pdf/1806.08047v2.pdf | |
PWC | https://paperswithcode.com/paper/flexible-neural-representation-for-physics |
Repo | |
Framework | |
Boosting Cooperative Coevolution for Large Scale Optimization with a Fine-Grained Computation Resource Allocation Strategy
Title | Boosting Cooperative Coevolution for Large Scale Optimization with a Fine-Grained Computation Resource Allocation Strategy |
Authors | Zhigang Ren, Yongsheng Liang, Aimin Zhang, Yang Yang, Zuren Feng, Lin Wang |
Abstract | Cooperative coevolution (CC) has shown great potential in solving large scale optimization problems (LSOPs). However, traditional CC algorithms often waste part of computation resource (CR) as they equally allocate CR among all the subproblems. The recently developed contribution-based CC (CBCC) algorithms improve the traditional ones to a certain extent by adaptively allocating CR according to some heuristic rules. Different from existing works, this study explicitly constructs a mathematical model for the CR allocation (CRA) problem in CC and proposes a novel fine-grained CRA (FCRA) strategy by fully considering both the theoretically optimal solution of the CRA model and the evolution characteristics of CC. FCRA takes a single iteration as a basic CRA unit and always selects the subproblem which is most likely to make the largest contribution to the total fitness improvement to undergo a new iteration, where the contribution of a subproblem at a new iteration is estimated according to its current contribution, current evolution status as well as the estimation for its current contribution. We verified the efficiency of FCRA by combining it with SHADE which is an excellent differential evolution variant but has never been employed in the CC framework. Experimental results on two benchmark suites for LSOPs demonstrate that FCRA significantly outperforms existing CRA strategies and the resultant CC algorithm is highly competitive in solving LSOPs. |
Tasks | |
Published | 2018-02-27 |
URL | http://arxiv.org/abs/1802.09703v2 |
http://arxiv.org/pdf/1802.09703v2.pdf | |
PWC | https://paperswithcode.com/paper/boosting-cooperative-coevolution-for-large |
Repo | |
Framework | |
From Pixels to Buildings: End-to-end Probabilistic Deep Networks for Large-scale Semantic Mapping
Title | From Pixels to Buildings: End-to-end Probabilistic Deep Networks for Large-scale Semantic Mapping |
Authors | Kaiyu Zheng, Andrzej Pronobis |
Abstract | We introduce TopoNets, end-to-end probabilistic deep networks for modeling semantic maps with structure reflecting the topology of large-scale environments. TopoNets build a unified deep network spanning multiple levels of abstraction and spatial scales, from pixels representing geometry of local places to high-level descriptions of semantics of buildings. To this end, TopoNets leverage complex spatial relations expressed in terms of arbitrary, dynamic graphs. We demonstrate how TopoNets can be used to perform end-to-end semantic mapping from partial sensory observations and noisy topological relations discovered by a robot exploring large-scale office spaces. Thanks to their probabilistic nature and generative properties, TopoNets extend the problem of semantic mapping beyond classification. We show that TopoNets successfully perform uncertain reasoning about yet unexplored space and detect novel and incongruent environment configurations unknown to the robot. Our implementation of TopoNets achieves real-time, tractable and exact inference, which makes these new deep models a promising, practical solution to mobile robot spatial understanding at scale. |
Tasks | |
Published | 2018-12-31 |
URL | https://arxiv.org/abs/1812.11866v8 |
https://arxiv.org/pdf/1812.11866v8.pdf | |
PWC | https://paperswithcode.com/paper/from-pixels-to-buildings-end-to-end |
Repo | |
Framework | |