July 27, 2019

3142 words 15 mins read

Paper Group ANR 731

Paper Group ANR 731

Cross-modal Common Representation Learning by Hybrid Transfer Network. The Expressive Power of Neural Networks: A View from the Width. Kill Two Birds with One Stone: Weakly-Supervised Neural Network for Image Annotation and Tag Refinement. Critical Points of Neural Networks: Analytical Forms and Landscape Properties. 3DContextNet: K-d Tree Guided H …

Cross-modal Common Representation Learning by Hybrid Transfer Network

Title Cross-modal Common Representation Learning by Hybrid Transfer Network
Authors Xin Huang, Yuxin Peng, Mingkuan Yuan
Abstract DNN-based cross-modal retrieval is a research hotspot to retrieve across different modalities as image and text, but existing methods often face the challenge of insufficient cross-modal training data. In single-modal scenario, similar problem is usually relieved by transferring knowledge from large-scale auxiliary datasets (as ImageNet). Knowledge from such single-modal datasets is also very useful for cross-modal retrieval, which can provide rich general semantic information that can be shared across different modalities. However, it is challenging to transfer useful knowledge from single-modal (as image) source domain to cross-modal (as image/text) target domain. Knowledge in source domain cannot be directly transferred to both two different modalities in target domain, and the inherent cross-modal correlation contained in target domain provides key hints for cross-modal retrieval which should be preserved during transfer process. This paper proposes Cross-modal Hybrid Transfer Network (CHTN) with two subnetworks: Modal-sharing transfer subnetwork utilizes the modality in both source and target domains as a bridge, for transferring knowledge to both two modalities simultaneously; Layer-sharing correlation subnetwork preserves the inherent cross-modal semantic correlation to further adapt to cross-modal retrieval task. Cross-modal data can be converted to common representation by CHTN for retrieval, and comprehensive experiment on 3 datasets shows its effectiveness.
Tasks Cross-Modal Retrieval, Representation Learning
Published 2017-06-01
URL http://arxiv.org/abs/1706.00153v2
PDF http://arxiv.org/pdf/1706.00153v2.pdf
PWC https://paperswithcode.com/paper/cross-modal-common-representation-learning-by
Repo
Framework

The Expressive Power of Neural Networks: A View from the Width

Title The Expressive Power of Neural Networks: A View from the Width
Authors Zhou Lu, Hongming Pu, Feicheng Wang, Zhiqiang Hu, Liwei Wang
Abstract The expressive power of neural networks is important for understanding deep learning. Most existing works consider this problem from the view of the depth of a network. In this paper, we study how width affects the expressiveness of neural networks. Classical results state that depth-bounded (e.g. depth-$2$) networks with suitable activation functions are universal approximators. We show a universal approximation theorem for width-bounded ReLU networks: width-$(n+4)$ ReLU networks, where $n$ is the input dimension, are universal approximators. Moreover, except for a measure zero set, all functions cannot be approximated by width-$n$ ReLU networks, which exhibits a phase transition. Several recent works demonstrate the benefits of depth by proving the depth-efficiency of neural networks. That is, there are classes of deep networks which cannot be realized by any shallow network whose size is no more than an exponential bound. Here we pose the dual question on the width-efficiency of ReLU networks: Are there wide networks that cannot be realized by narrow networks whose size is not substantially larger? We show that there exist classes of wide networks which cannot be realized by any narrow network whose depth is no more than a polynomial bound. On the other hand, we demonstrate by extensive experiments that narrow networks whose size exceed the polynomial bound by a constant factor can approximate wide and shallow network with high accuracy. Our results provide more comprehensive evidence that depth is more effective than width for the expressiveness of ReLU networks.
Tasks
Published 2017-09-08
URL http://arxiv.org/abs/1709.02540v3
PDF http://arxiv.org/pdf/1709.02540v3.pdf
PWC https://paperswithcode.com/paper/the-expressive-power-of-neural-networks-a
Repo
Framework

Kill Two Birds with One Stone: Weakly-Supervised Neural Network for Image Annotation and Tag Refinement

Title Kill Two Birds with One Stone: Weakly-Supervised Neural Network for Image Annotation and Tag Refinement
Authors Junjie Zhang, Qi Wu, Jian Zhang, Chunhua Shen, Jianfeng Lu
Abstract The number of social images has exploded by the wide adoption of social networks, and people like to share their comments about them. These comments can be a description of the image, or some objects, attributes, scenes in it, which are normally used as the user-provided tags. However, it is well-known that user-provided tags are incomplete and imprecise to some extent. Directly using them can damage the performance of related applications, such as the image annotation and retrieval. In this paper, we propose to learn an image annotation model and refine the user-provided tags simultaneously in a weakly-supervised manner. The deep neural network is utilized as the image feature learning and backbone annotation model, while visual consistency, semantic dependency, and user-error sparsity are introduced as the constraints at the batch level to alleviate the tag noise. Therefore, our model is highly flexible and stable to handle large-scale image sets. Experimental results on two benchmark datasets indicate that our proposed model achieves the best performance compared to the state-of-the-art methods.
Tasks
Published 2017-11-19
URL http://arxiv.org/abs/1711.06998v1
PDF http://arxiv.org/pdf/1711.06998v1.pdf
PWC https://paperswithcode.com/paper/kill-two-birds-with-one-stone-weakly
Repo
Framework

Critical Points of Neural Networks: Analytical Forms and Landscape Properties

Title Critical Points of Neural Networks: Analytical Forms and Landscape Properties
Authors Yi Zhou, Yingbin Liang
Abstract Due to the success of deep learning to solving a variety of challenging machine learning tasks, there is a rising interest in understanding loss functions for training neural networks from a theoretical aspect. Particularly, the properties of critical points and the landscape around them are of importance to determine the convergence performance of optimization algorithms. In this paper, we provide full (necessary and sufficient) characterization of the analytical forms for the critical points (as well as global minimizers) of the square loss functions for various neural networks. We show that the analytical forms of the critical points characterize the values of the corresponding loss functions as well as the necessary and sufficient conditions to achieve global minimum. Furthermore, we exploit the analytical forms of the critical points to characterize the landscape properties for the loss functions of these neural networks. One particular conclusion is that: The loss function of linear networks has no spurious local minimum, while the loss function of one-hidden-layer nonlinear networks with ReLU activation function does have local minimum that is not global minimum.
Tasks
Published 2017-10-30
URL http://arxiv.org/abs/1710.11205v1
PDF http://arxiv.org/pdf/1710.11205v1.pdf
PWC https://paperswithcode.com/paper/critical-points-of-neural-networks-analytical
Repo
Framework

3DContextNet: K-d Tree Guided Hierarchical Learning of Point Clouds Using Local and Global Contextual Cues

Title 3DContextNet: K-d Tree Guided Hierarchical Learning of Point Clouds Using Local and Global Contextual Cues
Authors Wei Zeng, Theo Gevers
Abstract Classification and segmentation of 3D point clouds are important tasks in computer vision. Because of the irregular nature of point clouds, most of the existing methods convert point clouds into regular 3D voxel grids before they are used as input for ConvNets. Unfortunately, voxel representations are highly insensitive to the geometrical nature of 3D data. More recent methods encode point clouds to higher dimensional features to cover the global 3D space. However, these models are not able to sufficiently capture the local structures of point clouds. Therefore, in this paper, we propose a method that exploits both local and global contextual cues imposed by the k-d tree. The method is designed to learn representation vectors progressively along the tree structure. Experiments on challenging benchmarks show that the proposed model provides discriminative point set features. For the task of 3D scene semantic segmentation, our method significantly outperforms the state-of-the-art on the Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS).
Tasks Semantic Segmentation
Published 2017-11-30
URL http://arxiv.org/abs/1711.11379v3
PDF http://arxiv.org/pdf/1711.11379v3.pdf
PWC https://paperswithcode.com/paper/3dcontextnet-k-d-tree-guided-hierarchical
Repo
Framework

Medical Image Analysis using Convolutional Neural Networks: A Review

Title Medical Image Analysis using Convolutional Neural Networks: A Review
Authors Syed Muhammad Anwar, Muhammad Majid, Adnan Qayyum, Muhammad Awais, Majdi Alnowami, Muhammad Khurram Khan
Abstract The science of solving clinical problems by analyzing images generated in clinical practice is known as medical image analysis. The aim is to extract information in an effective and efficient manner for improved clinical diagnosis. The recent advances in the field of biomedical engineering has made medical image analysis one of the top research and development area. One of the reason for this advancement is the application of machine learning techniques for the analysis of medical images. Deep learning is successfully used as a tool for machine learning, where a neural network is capable of automatically learning features. This is in contrast to those methods where traditionally hand crafted features are used. The selection and calculation of these features is a challenging task. Among deep learning techniques, deep convolutional networks are actively used for the purpose of medical image analysis. This include application areas such as segmentation, abnormality detection, disease classification, computer aided diagnosis and retrieval. In this study, a comprehensive review of the current state-of-the-art in medical image analysis using deep convolutional networks is presented. The challenges and potential of these techniques are also highlighted.
Tasks Anomaly Detection
Published 2017-09-04
URL https://arxiv.org/abs/1709.02250v2
PDF https://arxiv.org/pdf/1709.02250v2.pdf
PWC https://paperswithcode.com/paper/medical-image-analysis-using-convolutional
Repo
Framework

On SGD’s Failure in Practice: Characterizing and Overcoming Stalling

Title On SGD’s Failure in Practice: Characterizing and Overcoming Stalling
Authors Vivak Patel
Abstract Stochastic Gradient Descent (SGD) is widely used in machine learning problems to efficiently perform empirical risk minimization, yet, in practice, SGD is known to stall before reaching the actual minimizer of the empirical risk. SGD stalling has often been attributed to its sensitivity to the conditioning of the problem; however, as we demonstrate, SGD will stall even when applied to a simple linear regression problem with unity condition number for standard learning rates. Thus, in this work, we numerically demonstrate and mathematically argue that stalling is a crippling and generic limitation of SGD and its variants in practice. Once we have established the problem of stalling, we generalize an existing framework for hedging against its effects, which (1) deters SGD and its variants from stalling, (2) still provides convergence guarantees, and (3) makes SGD and its variants more practical methods for minimization.
Tasks
Published 2017-02-01
URL http://arxiv.org/abs/1702.00317v2
PDF http://arxiv.org/pdf/1702.00317v2.pdf
PWC https://paperswithcode.com/paper/on-sgds-failure-in-practice-characterizing
Repo
Framework

Multi-output Polynomial Networks and Factorization Machines

Title Multi-output Polynomial Networks and Factorization Machines
Authors Mathieu Blondel, Vlad Niculae, Takuma Otsuka, Naonori Ueda
Abstract Factorization machines and polynomial networks are supervised polynomial models based on an efficient low-rank decomposition. We extend these models to the multi-output setting, i.e., for learning vector-valued functions, with application to multi-class or multi-task problems. We cast this as the problem of learning a 3-way tensor whose slices share a common basis and propose a convex formulation of that problem. We then develop an efficient conditional gradient algorithm and prove its global convergence, despite the fact that it involves a non-convex basis selection step. On classification tasks, we show that our algorithm achieves excellent accuracy with much sparser models than existing methods. On recommendation system tasks, we show how to combine our algorithm with a reduction from ordinal regression to multi-output classification and show that the resulting algorithm outperforms simple baselines in terms of ranking accuracy.
Tasks
Published 2017-05-22
URL http://arxiv.org/abs/1705.07603v2
PDF http://arxiv.org/pdf/1705.07603v2.pdf
PWC https://paperswithcode.com/paper/multi-output-polynomial-networks-and
Repo
Framework

Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning

Title Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning
Authors Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, Antonio Torralba
Abstract The sound of crashing waves, the roar of fast-moving cars – sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models. To demonstrate this, we train a convolutional neural network to predict a statistical summary of the sound associated with a video frame. We show that, through this process, the network learns a representation that conveys information about objects and scenes. We evaluate this representation on several recognition tasks, finding that its performance is comparable to that of other state-of-the-art unsupervised learning methods. Finally, we show through visualizations that the network learns units that are selective to objects that are often associated with characteristic sounds. This paper extends an earlier conference paper, Owens et al. 2016, with additional experiments and discussion.
Tasks
Published 2017-12-20
URL http://arxiv.org/abs/1712.07271v1
PDF http://arxiv.org/pdf/1712.07271v1.pdf
PWC https://paperswithcode.com/paper/learning-sight-from-sound-ambient-sound
Repo
Framework

Vector Space Model as Cognitive Space for Text Classification

Title Vector Space Model as Cognitive Space for Text Classification
Authors Barathi Ganesh HB, Anand Kumar M, Soman KP
Abstract In this era of digitization, knowing the user’s sociolect aspects have become essential features to build the user specific recommendation systems. These sociolect aspects could be found by mining the user’s language sharing in the form of text in social media and reviews. This paper describes about the experiment that was performed in PAN Author Profiling 2017 shared task. The objective of the task is to find the sociolect aspects of the users from their tweets. The sociolect aspects considered in this experiment are user’s gender and native language information. Here user’s tweets written in a different language from their native language are represented as Document - Term Matrix with document frequency as the constraint. Further classification is done using the Support Vector Machine by taking gender and native language as target classes. This experiment attains the average accuracy of 73.42% in gender prediction and 76.26% in the native language identification task.
Tasks Gender Prediction, Language Identification, Native Language Identification, Recommendation Systems, Text Classification
Published 2017-08-21
URL http://arxiv.org/abs/1708.06068v1
PDF http://arxiv.org/pdf/1708.06068v1.pdf
PWC https://paperswithcode.com/paper/vector-space-model-as-cognitive-space-for
Repo
Framework

Multi-Task Learning for Mental Health using Social Media Text

Title Multi-Task Learning for Mental Health using Social Media Text
Authors Adrian Benton, Margaret Mitchell, Dirk Hovy
Abstract We introduce initial groundwork for estimating suicide risk and mental health in a deep learning framework. By modeling multiple conditions, the system learns to make predictions about suicide risk and mental health at a low false positive rate. Conditions are modeled as tasks in a multi-task learning (MTL) framework, with gender prediction as an additional auxiliary task. We demonstrate the effectiveness of multi-task learning by comparison to a well-tuned single-task baseline with the same number of parameters. Our best MTL model predicts potential suicide attempt, as well as the presence of atypical mental health, with AUC > 0.8. We also find additional large improvements using multi-task learning on mental health tasks with limited training data.
Tasks Gender Prediction, Multi-Task Learning
Published 2017-12-10
URL http://arxiv.org/abs/1712.03538v1
PDF http://arxiv.org/pdf/1712.03538v1.pdf
PWC https://paperswithcode.com/paper/multi-task-learning-for-mental-health-using
Repo
Framework

Added value of morphological features to breast lesion diagnosis in ultrasound

Title Added value of morphological features to breast lesion diagnosis in ultrasound
Authors Michał Byra, Katarzyna Dobruch-Sobczak, Hanna Piotrzkowska-Wróblewska, Andrzej Nowicki
Abstract Ultrasound imaging plays an important role in breast lesion differentiation. However, diagnostic accuracy depends on ultrasonographer experience. Various computer aided diagnosis systems has been developed to improve breast cancer detection and reduce the number of unnecessary biopsies. In this study, our aim was to improve breast lesion classification based on the BI-RADS (Breast Imaging - Reporting and Data System). This was accomplished by combining the BI-RADS with morphological features which assess lesion boundary. A dataset of 214 lesion images was used for analysis. 30 morphological features were extracted and feature selection scheme was applied to find features which improve the BI-RADS classification performance. Additionally, the best performing morphological feature subset was indicated. We obtained a better classification by combining the BI-RADS with six morphological features. These features were the extent, overlap ratio, NRL entropy, circularity, elliptic-normalized circumference and the normalized residual value. The area under the receiver operating curve calculated with the use of the combined classifier was 0.986. The best performing morphological feature subset contained six features: the DWR, NRL entropy, normalized residual value, overlap ratio, extent and the morphological closing ratio. For this set, the area under the curve was 0.901. The combination of the radiologist’s experience related to the BI-RADS and the morphological features leads to a more effective breast lesion classification.
Tasks Breast Cancer Detection, Feature Selection
Published 2017-06-06
URL http://arxiv.org/abs/1706.01855v1
PDF http://arxiv.org/pdf/1706.01855v1.pdf
PWC https://paperswithcode.com/paper/added-value-of-morphological-features-to
Repo
Framework

Adaptive Quantization for Deep Neural Network

Title Adaptive Quantization for Deep Neural Network
Authors Yiren Zhou, Seyed-Mohsen Moosavi-Dezfooli, Ngai-Man Cheung, Pascal Frossard
Abstract In recent years Deep Neural Networks (DNNs) have been rapidly developed in various applications, together with increasingly complex architectures. The performance gain of these DNNs generally comes with high computational costs and large memory consumption, which may not be affordable for mobile platforms. Deep model quantization can be used for reducing the computation and memory costs of DNNs, and deploying complex DNNs on mobile equipment. In this work, we propose an optimization framework for deep model quantization. First, we propose a measurement to estimate the effect of parameter quantization errors in individual layers on the overall model prediction accuracy. Then, we propose an optimization process based on this measurement for finding optimal quantization bit-width for each layer. This is the first work that theoretically analyse the relationship between parameter quantization errors of individual layers and model accuracy. Our new quantization algorithm outperforms previous quantization optimization methods, and achieves 20-40% higher compression rate compared to equal bit-width quantization at the same model prediction accuracy.
Tasks Quantization
Published 2017-12-04
URL http://arxiv.org/abs/1712.01048v1
PDF http://arxiv.org/pdf/1712.01048v1.pdf
PWC https://paperswithcode.com/paper/adaptive-quantization-for-deep-neural-network
Repo
Framework

Efficient exploration with Double Uncertain Value Networks

Title Efficient exploration with Double Uncertain Value Networks
Authors Thomas M. Moerland, Joost Broekens, Catholijn M. Jonker
Abstract This paper studies directed exploration for reinforcement learning agents by tracking uncertainty about the value of each available action. We identify two sources of uncertainty that are relevant for exploration. The first originates from limited data (parametric uncertainty), while the second originates from the distribution of the returns (return uncertainty). We identify methods to learn these distributions with deep neural networks, where we estimate parametric uncertainty with Bayesian drop-out, while return uncertainty is propagated through the Bellman equation as a Gaussian distribution. Then, we identify that both can be jointly estimated in one network, which we call the Double Uncertain Value Network. The policy is directly derived from the learned distributions based on Thompson sampling. Experimental results show that both types of uncertainty may vastly improve learning in domains with a strong exploration challenge.
Tasks Efficient Exploration
Published 2017-11-29
URL http://arxiv.org/abs/1711.10789v1
PDF http://arxiv.org/pdf/1711.10789v1.pdf
PWC https://paperswithcode.com/paper/efficient-exploration-with-double-uncertain
Repo
Framework

Object-Level Context Modeling For Scene Classification with Context-CNN

Title Object-Level Context Modeling For Scene Classification with Context-CNN
Authors Syed Ashar Javed, Anil Kumar Nelakanti
Abstract Convolutional Neural Networks (CNNs) have been used extensively for computer vision tasks and produce rich feature representation for objects or parts of an image. But reasoning about scenes requires integration between the low-level feature representations and the high-level semantic information. We propose a deep network architecture which models the semantic context of scenes by capturing object-level information. We use Long Short Term Memory(LSTM) units in conjunction with object proposals to incorporate object-object relationship and object-scene relationship in an end-to-end trainable manner. We evaluate our model on the LSUN dataset and achieve results comparable to the state-of-art. We further show visualization of the learned features and analyze the model with experiments to verify our model’s ability to model context.
Tasks Scene Classification
Published 2017-05-11
URL http://arxiv.org/abs/1705.04358v2
PDF http://arxiv.org/pdf/1705.04358v2.pdf
PWC https://paperswithcode.com/paper/object-level-context-modeling-for-scene
Repo
Framework
comments powered by Disqus