October 20, 2019

3339 words 16 mins read

Paper Group ANR 37

Paper Group ANR 37

Convex Class Model on Symmetric Positive Definite Manifolds. Don’t Classify, Translate: Multi-Level E-Commerce Product Categorization Via Machine Translation. Physics-based Learned Design: Optimized Coded-Illumination for Quantitative Phase Imaging. Towards the Targeted Environment-Specific Evolution of Robot Components. Sample Dropout for Audio Sc …

Convex Class Model on Symmetric Positive Definite Manifolds

Title Convex Class Model on Symmetric Positive Definite Manifolds
Authors Kun Zhao, Arnold Wiliem, Shaokang Chen, Brian C. Lovell
Abstract The effectiveness of Symmetric Positive Definite (SPD) manifold features has been proven in various computer vision tasks. However, due to the non-Euclidean geometry of these features, existing Euclidean machineries cannot be directly used. In this paper, we tackle the classification tasks with limited training data on SPD manifolds. Our proposed framework, named Manifold Convex Class Model, represents each class on SPD manifolds using a convex model, and classification can be performed by computing distances to the convex models. We provide three methods based on different metrics to address the optimization problem of the smallest distance of a point to the convex model on SPD manifold. The efficacy of our proposed framework is demonstrated both on synthetic data and several computer vision tasks including object recognition, texture classification, person re-identification and traffic scene classification.
Tasks Object Recognition, Person Re-Identification, Scene Classification, Texture Classification
Published 2018-06-14
URL https://arxiv.org/abs/1806.05343v2
PDF https://arxiv.org/pdf/1806.05343v2.pdf
PWC https://paperswithcode.com/paper/convex-class-model-on-symmetric-positive
Repo
Framework

Don’t Classify, Translate: Multi-Level E-Commerce Product Categorization Via Machine Translation

Title Don’t Classify, Translate: Multi-Level E-Commerce Product Categorization Via Machine Translation
Authors Maggie Yundi Li, Stanley Kok, Liling Tan
Abstract E-commerce platforms categorize their products into a multi-level taxonomy tree with thousands of leaf categories. Conventional methods for product categorization are typically based on machine learning classification algorithms. These algorithms take product information as input (e.g., titles and descriptions) to classify a product into a leaf category. In this paper, we propose a new paradigm based on machine translation. In our approach, we translate a product’s natural language description into a sequence of tokens representing a root-to-leaf path in a product taxonomy. In our experiments on two large real-world datasets, we show that our approach achieves better predictive accuracy than a state-of-the-art classification system for product categorization. In addition, we demonstrate that our machine translation models can propose meaningful new paths between previously unconnected nodes in a taxonomy tree, thereby transforming the taxonomy into a directed acyclic graph (DAG). We discuss how the resultant taxonomy DAG promotes user-friendly navigation, and how it is more adaptable to new products.
Tasks Machine Translation, Product Categorization
Published 2018-12-14
URL http://arxiv.org/abs/1812.05774v1
PDF http://arxiv.org/pdf/1812.05774v1.pdf
PWC https://paperswithcode.com/paper/dont-classify-translate-multi-level-e
Repo
Framework

Physics-based Learned Design: Optimized Coded-Illumination for Quantitative Phase Imaging

Title Physics-based Learned Design: Optimized Coded-Illumination for Quantitative Phase Imaging
Authors Michael R. Kellman, Emrah Bostan, Nicole Repina, Laura Waller
Abstract Coded-illumination can enable quantitative phase microscopy of transparent samples with minimal hardware requirements. Intensity images are captured with different source patterns and a non-linear phase retrieval optimization reconstructs the image. The non-linear nature of the processing makes optimizing the illumination pattern designs complicated. Traditional techniques for experimental design (e.g. condition number optimization, spectral analysis) consider only linear measurement formation models and linear reconstructions. Deep neural networks (DNNs) can efficiently represent the non-linear process and can be optimized over via training in an end-to-end framework. However, DNNs typically require a large amount of training examples and parameters to properly learn the phase retrieval process, without making use of the known physical models. Here, we aim to use both our knowledge of the physics and the power of machine learning together. We develop a new data-driven approach to optimizing coded-illumination patterns for a LED array microscope for a given phase reconstruction algorithm. Our method incorporates both the physics of the measurement scheme and the non-linearity of the reconstruction algorithm into the design problem. This enables efficient parameterization, which allows us to use only a small number of training examples to learn designs that generalize well in the experimental setting without retraining. We show experimental results for both a well-characterized phase target and mouse fibroblast cells using coded-illumination patterns optimized for a sparsity-based phase reconstruction algorithm. Our learned design results using 2 measurements demonstrate similar accuracy to Fourier Ptychography with 69 measurements.
Tasks
Published 2018-08-10
URL http://arxiv.org/abs/1808.03571v3
PDF http://arxiv.org/pdf/1808.03571v3.pdf
PWC https://paperswithcode.com/paper/physics-based-learned-design-optimized-coded
Repo
Framework

Towards the Targeted Environment-Specific Evolution of Robot Components

Title Towards the Targeted Environment-Specific Evolution of Robot Components
Authors Jack Collins, Wade Geles, David Howard, Frederic Maire
Abstract This research considers the task of evolving the physical structure of a robot to enhance its performance in various environments, which is a significant problem in the field of Evolutionary Robotics. Inspired by the fields of evolutionary art and sculpture, we evolve only targeted parts of a robot, which simplifies the optimisation problem compared to traditional approaches that must simultaneously evolve both (actuated) body and brain. Exploration fidelity is emphasised in areas of the robot most likely to benefit from shape optimisation, whilst exploiting existing robot structure and control. Our approach uses a Genetic Algorithm to optimise collections of Bezier splines that together define the shape of a legged robot’s tibia, and leg performance is evaluated in parallel in a high-fidelity simulator. The leg is represented in the simulator as 3D-printable file, and as such can be readily instantiated in reality. Provisional experiments in three distinct environments show the evolution of environment-specific leg structures that are both high-performing and notably different to those evolved in the other environments. This proof-of-concept represents an important step towards the environment-dependent optimisation of performance-critical components for a range of ubiquitous, standard, and already-capable robots that can carry out a wide variety of tasks.
Tasks
Published 2018-10-11
URL http://arxiv.org/abs/1810.04735v1
PDF http://arxiv.org/pdf/1810.04735v1.pdf
PWC https://paperswithcode.com/paper/towards-the-targeted-environment-specific
Repo
Framework

Sample Dropout for Audio Scene Classification Using Multi-Scale Dense Connected Convolutional Neural Network

Title Sample Dropout for Audio Scene Classification Using Multi-Scale Dense Connected Convolutional Neural Network
Authors Dawei Feng, Kele Xu, Haibo Mi, Feifan Liao, Yan Zhou
Abstract Acoustic scene classification is an intricate problem for a machine. As an emerging field of research, deep Convolutional Neural Networks (CNN) achieve convincing results. In this paper, we explore the use of multi-scale Dense connected convolutional neural network (DenseNet) for the classification task, with the goal to improve the classification performance as multi-scale features can be extracted from the time-frequency representation of the audio signal. On the other hand, most of previous CNN-based audio scene classification approaches aim to improve the classification accuracy, by employing different regularization techniques, such as the dropout of hidden units and data augmentation, to reduce overfitting. It is widely known that outliers in the training set have a high negative influence on the trained model, and culling the outliers may improve the classification performance, while it is often under-explored in previous studies. In this paper, inspired by the silence removal in the speech signal processing, a novel sample dropout approach is proposed, which aims to remove outliers in the training dataset. Using the DCASE 2017 audio scene classification datasets, the experimental results demonstrates the proposed multi-scale DenseNet providing a superior performance than the traditional single-scale DenseNet, while the sample dropout method can further improve the classification robustness of multi-scale DenseNet.
Tasks Acoustic Scene Classification, Data Augmentation, Scene Classification
Published 2018-06-12
URL http://arxiv.org/abs/1806.04422v1
PDF http://arxiv.org/pdf/1806.04422v1.pdf
PWC https://paperswithcode.com/paper/sample-dropout-for-audio-scene-classification
Repo
Framework

Learning Effective Binary Visual Representations with Deep Networks

Title Learning Effective Binary Visual Representations with Deep Networks
Authors Jianxin Wu, Jian-Hao Luo
Abstract Although traditionally binary visual representations are mainly designed to reduce computational and storage costs in the image retrieval research, this paper argues that binary visual representations can be applied to large scale recognition and detection problems in addition to hashing in retrieval. Furthermore, the binary nature may make it generalize better than its real-valued counterparts. Existing binary hashing methods are either two-stage or hinging on loss term regularization or saturated functions, hence converge slowly and only emit soft binary values. This paper proposes Approximately Binary Clamping (ABC), which is non-saturating, end-to-end trainable, with fast convergence and can output true binary visual representations. ABC achieves comparable accuracy in ImageNet classification as its real-valued counterpart, and even generalizes better in object detection. On benchmark image retrieval datasets, ABC also outperforms existing hashing methods.
Tasks Image Retrieval, Object Detection
Published 2018-03-08
URL http://arxiv.org/abs/1803.03004v1
PDF http://arxiv.org/pdf/1803.03004v1.pdf
PWC https://paperswithcode.com/paper/learning-effective-binary-visual
Repo
Framework

Response to Comment on “All-optical machine learning using diffractive deep neural networks”

Title Response to Comment on “All-optical machine learning using diffractive deep neural networks”
Authors Deniz Mengu, Yi Luo, Yair Rivenson, Xing Lin, Muhammed Veli, Aydogan Ozcan
Abstract In their Comment, Wei et al. (arXiv:1809.08360v1 [cs.LG]) claim that our original interpretation of Diffractive Deep Neural Networks (D2NN) represent a mischaracterization of the system due to linearity and passivity. In this Response, we detail how this mischaracterization claim is unwarranted and oblivious to several sections detailed in our original manuscript (Science, DOI: 10.1126/science.aat8084) that specifically introduced and discussed optical nonlinearities and reconfigurability of D2NNs, as part of our proposed framework to enhance its performance. To further refute the mischaracterization claim of Wei et al., we, once again, demonstrate the depth feature of optical D2NNs by showing that multiple diffractive layers operating collectively within a D2NN present additional degrees-of-freedom compared to a single diffractive layer to achieve better classification accuracy, as well as improved output signal contrast and diffraction efficiency as the number of diffractive layers increase, showing the deepness of a D2NN, and its inherent depth advantage for improved performance. In summary, the Comment by Wei et al. does not provide an amendment to the original teachings of our original manuscript, and all of our results, core conclusions and methodology of research reported in Science (DOI: 10.1126/science.aat8084) remain entirely valid.
Tasks
Published 2018-10-10
URL http://arxiv.org/abs/1810.04384v1
PDF http://arxiv.org/pdf/1810.04384v1.pdf
PWC https://paperswithcode.com/paper/response-to-comment-on-all-optical-machine
Repo
Framework

AID++: An Updated Version of AID on Scene Classification

Title AID++: An Updated Version of AID on Scene Classification
Authors Pu Jin, Gui-Song Xia, Fan Hu, Qikai Lu, Liangpei Zhang
Abstract Aerial image scene classification is a fundamental problem for understanding high-resolution remote sensing images and has become an active research task in the field of remote sensing due to its important role in a wide range of applications. However, the limitations of existing datasets for scene classification, such as the small scale and low-diversity, severely hamper the potential usage of the new generation deep convolutional neural networks (CNNs). Although huge efforts have been made in building large-scale datasets very recently, e.g., the Aerial Image Dataset (AID) which contains 10,000 image samples, they are still far from sufficient to fully train a high-capacity deep CNN model. To this end, we present a larger-scale dataset in this paper, named as AID++, for aerial scene classification based on the AID dataset. The proposed AID++ consists of more than 400,000 image samples that are semi-automatically annotated by using the existing the geographical data. We evaluate several prevalent CNN models on the proposed dataset, and the results show that our dataset can be used as a promising benchmark for scene classification.
Tasks Scene Classification
Published 2018-06-03
URL http://arxiv.org/abs/1806.00801v1
PDF http://arxiv.org/pdf/1806.00801v1.pdf
PWC https://paperswithcode.com/paper/aid-an-updated-version-of-aid-on-scene
Repo
Framework

Learning Personalized Representation for Inverse Problems in Medical Imaging Using Deep Neural Network

Title Learning Personalized Representation for Inverse Problems in Medical Imaging Using Deep Neural Network
Authors Kuang Gong, Kyungsang Kim, Jianan Cui, Ning Guo, Ciprian Catana, Jinyi Qi, Quanzheng Li
Abstract Recently deep neural networks have been widely and successfully applied in computer vision tasks and attracted growing interests in medical imaging. One barrier for the application of deep neural networks to medical imaging is the need of large amounts of prior training pairs, which is not always feasible in clinical practice. In this work we propose a personalized representation learning framework where no prior training pairs are needed, but only the patient’s own prior images. The representation is expressed using a deep neural network with the patient’s prior images as network input. We then applied this novel image representation to inverse problems in medical imaging in which the original inverse problem was formulated as a constraint optimization problem and solved using the alternating direction method of multipliers (ADMM) algorithm. Anatomically guided brain positron emission tomography (PET) image reconstruction and image denoising were employed as examples to demonstrate the effectiveness of the proposed framework. Quantification results based on simulation and real datasets show that the proposed personalized representation framework outperform other widely adopted methods.
Tasks Denoising, Image Denoising, Image Reconstruction, Representation Learning
Published 2018-07-04
URL http://arxiv.org/abs/1807.01759v1
PDF http://arxiv.org/pdf/1807.01759v1.pdf
PWC https://paperswithcode.com/paper/learning-personalized-representation-for
Repo
Framework

Automated Vision-based Bridge Component Extraction Using Multiscale Convolutional Neural Networks

Title Automated Vision-based Bridge Component Extraction Using Multiscale Convolutional Neural Networks
Authors Yasutaka Narazaki, Vedhus Hoskere, Tu A. Hoang, Billie F. Spencer Jr
Abstract Image data has a great potential of helping post-earthquake visual inspections of civil engineering structures due to the ease of data acquisition and the advantages in capturing visual information. A variety of techniques have been applied to detect damages automatically from a close-up image of a structural component. However, the application of the automatic damage detection methods become increasingly difficult when the image includes multiple components from different structures. To reduce the inaccurate false positive alarms, critical structural components need to be recognized first, and the damage alarms need to be cleaned using the component recognition results. To achieve the goal, this study aims at recognizing and extracting bridge components from images of urban scenes. The bridge component recognition begins with pixel-wise classifications of an image into 10 scene classes. Then, the original image and the scene classification results are combined to classify the image pixels into five component classes. The multi-scale convolutional neural networks (multi-scale CNNs) are used to perform pixel-wise classification, and the classification results are post-processed by averaging within superpixels and smoothing by conditional random fields (CRFs). The performance of the bridge component extraction is tested in terms of accuracy and consistency.
Tasks Scene Classification
Published 2018-05-15
URL http://arxiv.org/abs/1805.06042v1
PDF http://arxiv.org/pdf/1805.06042v1.pdf
PWC https://paperswithcode.com/paper/automated-vision-based-bridge-component
Repo
Framework

A Fine-to-Coarse Convolutional Neural Network for 3D Human Action Recognition

Title A Fine-to-Coarse Convolutional Neural Network for 3D Human Action Recognition
Authors Thao Minh Le, Nakamasa Inoue, Koichi Shinoda
Abstract This paper presents a new framework for human action recognition from a 3D skeleton sequence. Previous studies do not fully utilize the temporal relationships between video segments in a human action. Some studies successfully used very deep Convolutional Neural Network (CNN) models but often suffer from the data insufficiency problem. In this study, we first segment a skeleton sequence into distinct temporal segments in order to exploit the correlations between them. The temporal and spatial features of a skeleton sequence are then extracted simultaneously by utilizing a fine-to-coarse (F2C) CNN architecture optimized for human skeleton sequences. We evaluate our proposed method on NTU RGB+D and SBU Kinect Interaction dataset. It achieves 79.6% and 84.6% of accuracies on NTU RGB+D with cross-object and cross-view protocol, respectively, which are almost identical with the state-of-the-art performance. In addition, our method significantly improves the accuracy of the actions in two-person interactions.
Tasks 3D Human Action Recognition, Skeleton Based Action Recognition, Temporal Action Localization
Published 2018-05-30
URL http://arxiv.org/abs/1805.11790v2
PDF http://arxiv.org/pdf/1805.11790v2.pdf
PWC https://paperswithcode.com/paper/a-fine-to-coarse-convolutional-neural-network
Repo
Framework

Personalized Survival Prediction with Contextual Explanation Networks

Title Personalized Survival Prediction with Contextual Explanation Networks
Authors Maruan Al-Shedivat, Avinava Dubey, Eric P. Xing
Abstract Accurate and transparent prediction of cancer survival times on the level of individual patients can inform and improve patient care and treatment practices. In this paper, we design a model that concurrently learns to accurately predict patient-specific survival distributions and to explain its predictions in terms of patient attributes such as clinical tests or assessments. Our model is flexible and based on a recurrent network, can handle various modalities of data including temporal measurements, and yet constructs and uses simple explanations in the form of patient- and time-specific linear regression. For analysis, we use two publicly available datasets and show that our networks outperform a number of baselines in prediction while providing a way to inspect the reasons behind each prediction.
Tasks
Published 2018-01-30
URL http://arxiv.org/abs/1801.09810v1
PDF http://arxiv.org/pdf/1801.09810v1.pdf
PWC https://paperswithcode.com/paper/personalized-survival-prediction-with
Repo
Framework

Vision-based Automated Bridge Component Recognition Integrated With High-level Scene Understanding

Title Vision-based Automated Bridge Component Recognition Integrated With High-level Scene Understanding
Authors Yasutaka Narazaki, Vedhus Hoskere, Tu A. Hoang, Billie F. Spencer
Abstract Image data has a great potential of helping conventional visual inspections of civil engineering structures due to the ease of data acquisition and the advantages in capturing visual information. A variety of techniques have been proposed to detect damages, such as cracks and spalling on a close-up image of a single component (columns and road surfaces etc.). However, these techniques commonly suffer from severe false-positives especially when the image includes multiple components of different structures. To reduce the false-positives and extract reliable information about the structures’ conditions, detection and localization of critical structural components are important first steps preceding the damage assessment. This study aims at recognizing bridge structural and non-structural components from images of urban scenes. During the bridge component recognition, every image pixel is classified into one of the five classes (non-bridge, columns, beams and slabs, other structural, other nonstructural) by multi-scale convolutional neural networks (multi-scale CNNs). To reduce false-positives and get consistent labels, the component classifications are integrated with scene understanding by an additional classifier with 10 higher-level scene classes (building, greenery, person, pavement, signs and poles, vehicles, bridges, water, sky, and others). The bridge component recognition integrated with the scene understanding is compared with the naive approach without scene classification in terms of accuracy, false-positives and consistencies to demonstrate the effectiveness of the integrated approach.
Tasks Scene Classification, Scene Understanding
Published 2018-05-15
URL http://arxiv.org/abs/1805.06041v1
PDF http://arxiv.org/pdf/1805.06041v1.pdf
PWC https://paperswithcode.com/paper/vision-based-automated-bridge-component
Repo
Framework

Adversarial Meta-Learning

Title Adversarial Meta-Learning
Authors Chengxiang Yin, Jian Tang, Zhiyuan Xu, Yanzhi Wang
Abstract Meta-learning enables a model to learn from very limited data to undertake a new task. In this paper, we study the general meta-learning with adversarial samples. We present a meta-learning algorithm, ADML (ADversarialMeta-Learner), which leverages clean and adversarial samples to optimize the initialization of a learning model in an adversarial manner. ADML leads to the following desirable properties: 1) it turns out to be very effective even in the cases with only clean samples; 2) it is model-agnostic, i.e., it is compatible with any learning model that can be trained with gradient descent; and most importantly, 3) it is robust to adversarial samples, i.e., unlike other meta-learning methods, it only leads to a minor performance degradation when there are adversarial samples. We show via extensive experiments that ADML delivers the state-of-the-art performance on two widely-used image datasets, MiniImageNetand CIFAR100, in terms of both accuracy and robustness.
Tasks Meta-Learning
Published 2018-06-08
URL https://arxiv.org/abs/1806.03316v2
PDF https://arxiv.org/pdf/1806.03316v2.pdf
PWC https://paperswithcode.com/paper/adversarial-meta-learning
Repo
Framework

Partially-Supervised Image Captioning

Title Partially-Supervised Image Captioning
Authors Peter Anderson, Stephen Gould, Mark Johnson
Abstract Image captioning models are becoming increasingly successful at describing the content of images in restricted domains. However, if these models are to function in the wild - for example, as assistants for people with impaired vision - a much larger number and variety of visual concepts must be understood. To address this problem, we teach image captioning models new visual concepts from labeled images and object detection datasets. Since image labels and object classes can be interpreted as partial captions, we formulate this problem as learning from partially-specified sequence data. We then propose a novel algorithm for training sequence models, such as recurrent neural networks, on partially-specified sequences which we represent using finite state automata. In the context of image captioning, our method lifts the restriction that previously required image captioning models to be trained on paired image-sentence corpora only, or otherwise required specialized model architectures to take advantage of alternative data modalities. Applying our approach to an existing neural captioning model, we achieve state of the art results on the novel object captioning task using the COCO dataset. We further show that we can train a captioning model to describe new visual concepts from the Open Images dataset while maintaining competitive COCO evaluation scores.
Tasks Image Captioning, Object Detection
Published 2018-06-15
URL http://arxiv.org/abs/1806.06004v2
PDF http://arxiv.org/pdf/1806.06004v2.pdf
PWC https://paperswithcode.com/paper/partially-supervised-image-captioning
Repo
Framework
comments powered by Disqus