July 26, 2019

3029 words 15 mins read

Paper Group ANR 756

Paper Group ANR 756

Deep Style Match for Complementary Recommendation. Towards Automatic Abdominal Multi-Organ Segmentation in Dual Energy CT using Cascaded 3D Fully Convolutional Network. On the Limitations of First-Order Approximation in GAN Dynamics. Learning Robust Object Recognition Using Composed Scenes from Generative Models. Deep Reinforcement Learning-based I …

Deep Style Match for Complementary Recommendation

Title Deep Style Match for Complementary Recommendation
Authors Kui Zhao, Xia Hu, Jiajun Bu, Can Wang
Abstract Humans develop a common sense of style compatibility between items based on their attributes. We seek to automatically answer questions like “Does this shirt go well with that pair of jeans?” In order to answer these kinds of questions, we attempt to model human sense of style compatibility in this paper. The basic assumption of our approach is that most of the important attributes for a product in an online store are included in its title description. Therefore it is feasible to learn style compatibility from these descriptions. We design a Siamese Convolutional Neural Network architecture and feed it with title pairs of items, which are either compatible or incompatible. Those pairs will be mapped from the original space of symbolic words into some embedded style space. Our approach takes only words as the input with few preprocessing and there is no laborious and expensive feature engineering.
Tasks Common Sense Reasoning, Feature Engineering
Published 2017-08-26
URL http://arxiv.org/abs/1708.07938v1
PDF http://arxiv.org/pdf/1708.07938v1.pdf
PWC https://paperswithcode.com/paper/deep-style-match-for-complementary
Repo
Framework

Towards Automatic Abdominal Multi-Organ Segmentation in Dual Energy CT using Cascaded 3D Fully Convolutional Network

Title Towards Automatic Abdominal Multi-Organ Segmentation in Dual Energy CT using Cascaded 3D Fully Convolutional Network
Authors Shuqing Chen, Holger Roth, Sabrina Dorn, Matthias May, Alexander Cavallaro, Michael M. Lell, Marc Kachelrieß, Hirohisa Oda, Kensaku Mori, Andreas Maier
Abstract Automatic multi-organ segmentation of the dual energy computed tomography (DECT) data can be beneficial for biomedical research and clinical applications. However, it is a challenging task. Recent advances in deep learning showed the feasibility to use 3-D fully convolutional networks (FCN) for voxel-wise dense predictions in single energy computed tomography (SECT). In this paper, we proposed a 3D FCN based method for automatic multi-organ segmentation in DECT. The work was based on a cascaded FCN and a general model for the major organs trained on a large set of SECT data. We preprocessed the DECT data by using linear weighting and fine-tuned the model for the DECT data. The method was evaluated using 42 torso DECT data acquired with a clinical dual-source CT system. Four abdominal organs (liver, spleen, left and right kidneys) were evaluated. Cross-validation was tested. Effect of the weight on the accuracy was researched. In all the tests, we achieved an average Dice coefficient of 93% for the liver, 90% for the spleen, 91% for the right kidney and 89% for the left kidney, respectively. The results show our method is feasible and promising.
Tasks
Published 2017-10-15
URL http://arxiv.org/abs/1710.05379v1
PDF http://arxiv.org/pdf/1710.05379v1.pdf
PWC https://paperswithcode.com/paper/towards-automatic-abdominal-multi-organ
Repo
Framework

On the Limitations of First-Order Approximation in GAN Dynamics

Title On the Limitations of First-Order Approximation in GAN Dynamics
Authors Jerry Li, Aleksander Madry, John Peebles, Ludwig Schmidt
Abstract While Generative Adversarial Networks (GANs) have demonstrated promising performance on multiple vision tasks, their learning dynamics are not yet well understood, both in theory and in practice. To address this issue, we study GAN dynamics in a simple yet rich parametric model that exhibits several of the common problematic convergence behaviors such as vanishing gradients, mode collapse, and diverging or oscillatory behavior. In spite of the non-convex nature of our model, we are able to perform a rigorous theoretical analysis of its convergence behavior. Our analysis reveals an interesting dichotomy: a GAN with an optimal discriminator provably converges, while first order approximations of the discriminator steps lead to unstable GAN dynamics and mode collapse. Our result suggests that using first order discriminator steps (the de-facto standard in most existing GAN setups) might be one of the factors that makes GAN training challenging in practice.
Tasks
Published 2017-06-29
URL http://arxiv.org/abs/1706.09884v2
PDF http://arxiv.org/pdf/1706.09884v2.pdf
PWC https://paperswithcode.com/paper/on-the-limitations-of-first-order-1
Repo
Framework

Learning Robust Object Recognition Using Composed Scenes from Generative Models

Title Learning Robust Object Recognition Using Composed Scenes from Generative Models
Authors Hao Wang, Xingyu Lin, Yimeng Zhang, Tai Sing Lee
Abstract Recurrent feedback connections in the mammalian visual system have been hypothesized to play a role in synthesizing input in the theoretical framework of analysis by synthesis. The comparison of internally synthesized representation with that of the input provides a validation mechanism during perceptual inference and learning. Inspired by these ideas, we proposed that the synthesis machinery can compose new, unobserved images by imagination to train the network itself so as to increase the robustness of the system in novel scenarios. As a proof of concept, we investigated whether images composed by imagination could help an object recognition system to deal with occlusion, which is challenging for the current state-of-the-art deep convolutional neural networks. We fine-tuned a network on images containing objects in various occlusion scenarios, that are imagined or self-generated through a deep generator network. Trained on imagined occluded scenarios under the object persistence constraint, our network discovered more subtle and localized image features that were neglected by the original network for object classification, obtaining better separability of different object classes in the feature space. This leads to significant improvement of object recognition under occlusion for our network relative to the original network trained only on un-occluded images. In addition to providing practical benefits in object recognition under occlusion, this work demonstrates the use of self-generated composition of visual scenes through the synthesis loop, combined with the object persistence constraint, can provide opportunities for neural networks to discover new relevant patterns in the data, and become more flexible in dealing with novel situations.
Tasks Object Classification, Object Recognition
Published 2017-05-22
URL http://arxiv.org/abs/1705.07594v1
PDF http://arxiv.org/pdf/1705.07594v1.pdf
PWC https://paperswithcode.com/paper/learning-robust-object-recognition-using
Repo
Framework

Deep Reinforcement Learning-based Image Captioning with Embedding Reward

Title Deep Reinforcement Learning-based Image Captioning with Embedding Reward
Authors Zhou Ren, Xiaoyu Wang, Ning Zhang, Xutao Lv, Li-Jia Li
Abstract Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance of this task. Most state-of-the-art approaches follow an encoder-decoder framework, which generates captions using a sequential recurrent prediction model. However, in this paper, we introduce a novel decision-making framework for image captioning. We utilize a “policy network” and a “value network” to collaboratively generate captions. The policy network serves as a local guidance by providing the confidence of predicting the next word according to the current state. Additionally, the value network serves as a global and lookahead guidance by evaluating all possible extensions of the current state. In essence, it adjusts the goal of predicting the correct words towards the goal of generating captions similar to the ground truth captions. We train both networks using an actor-critic reinforcement learning model, with a novel reward defined by visual-semantic embedding. Extensive experiments and analyses on the Microsoft COCO dataset show that the proposed framework outperforms state-of-the-art approaches across different evaluation metrics.
Tasks Decision Making, Image Captioning
Published 2017-04-12
URL http://arxiv.org/abs/1704.03899v1
PDF http://arxiv.org/pdf/1704.03899v1.pdf
PWC https://paperswithcode.com/paper/deep-reinforcement-learning-based-image
Repo
Framework

Estimating Historical Hourly Traffic Volumes via Machine Learning and Vehicle Probe Data: A Maryland Case Study

Title Estimating Historical Hourly Traffic Volumes via Machine Learning and Vehicle Probe Data: A Maryland Case Study
Authors Przemysław Sekuła, Nikola Marković, Zachary Vander Laan, Kaveh Farokhi Sadabadi
Abstract This paper focuses on the problem of estimating historical traffic volumes between sparsely-located traffic sensors, which transportation agencies need to accurately compute statewide performance measures. To this end, the paper examines applications of vehicle probe data, automatic traffic recorder counts, and neural network models to estimate hourly volumes in the Maryland highway network, and proposes a novel approach that combines neural networks with an existing profiling method. On average, the proposed approach yields 24% more accurate estimates than volume profiles, which are currently used by transportation agencies across the US to compute statewide performance measures. The paper also quantifies the value of using vehicle probe data in estimating hourly traffic volumes, which provides important managerial insights to transportation agencies interested in acquiring this type of data. For example, results show that volumes can be estimated with a mean absolute percent error of about 21% at locations where average number of observed probes is between 30 and 47 vehicles/hr, which provides a useful guideline for assessing the value of probe vehicle data from different vendors.
Tasks
Published 2017-11-02
URL http://arxiv.org/abs/1711.00721v2
PDF http://arxiv.org/pdf/1711.00721v2.pdf
PWC https://paperswithcode.com/paper/estimating-historical-hourly-traffic-volumes
Repo
Framework

L2 Regularization versus Batch and Weight Normalization

Title L2 Regularization versus Batch and Weight Normalization
Authors Twan van Laarhoven
Abstract Batch Normalization is a commonly used trick to improve the training of deep neural networks. These neural networks use L2 regularization, also called weight decay, ostensibly to prevent overfitting. However, we show that L2 regularization has no regularizing effect when combined with normalization. Instead, regularization has an influence on the scale of weights, and thereby on the effective learning rate. We investigate this dependence, both in theory, and experimentally. We show that popular optimization methods such as ADAM only partially eliminate the influence of normalization on the learning rate. This leads to a discussion on other ways to mitigate this issue.
Tasks L2 Regularization
Published 2017-06-16
URL http://arxiv.org/abs/1706.05350v1
PDF http://arxiv.org/pdf/1706.05350v1.pdf
PWC https://paperswithcode.com/paper/l2-regularization-versus-batch-and-weight
Repo
Framework

Progressive Joint Modeling in Unsupervised Single-channel Overlapped Speech Recognition

Title Progressive Joint Modeling in Unsupervised Single-channel Overlapped Speech Recognition
Authors Zhehuai Chen, Jasha Droppo, Jinyu Li, Wayne Xiong
Abstract Unsupervised single-channel overlapped speech recognition is one of the hardest problems in automatic speech recognition (ASR). Permutation invariant training (PIT) is a state of the art model-based approach, which applies a single neural network to solve this single-input, multiple-output modeling problem. We propose to advance the current state of the art by imposing a modular structure on the neural network, applying a progressive pretraining regimen, and improving the objective function with transfer learning and a discriminative training criterion. The modular structure splits the problem into three sub-tasks: frame-wise interpreting, utterance-level speaker tracing, and speech recognition. The pretraining regimen uses these modules to solve progressively harder tasks. Transfer learning leverages parallel clean speech to improve the training targets for the network. Our discriminative training formulation is a modification of standard formulations, that also penalizes competing outputs of the system. Experiments are conducted on the artificial overlapped Switchboard and hub5e-swb dataset. The proposed framework achieves over 30% relative improvement of WER over both a strong jointly trained system, PIT for ASR, and a separately optimized system, PIT for speech separation with clean speech ASR model. The improvement comes from better model generalization, training efficiency and the sequence level linguistic knowledge integration.
Tasks Speech Recognition, Speech Separation, Transfer Learning
Published 2017-07-21
URL http://arxiv.org/abs/1707.07048v2
PDF http://arxiv.org/pdf/1707.07048v2.pdf
PWC https://paperswithcode.com/paper/progressive-joint-modeling-in-unsupervised
Repo
Framework

Convergence rate of a simulated annealing algorithm with noisy observations

Title Convergence rate of a simulated annealing algorithm with noisy observations
Authors Clément Bouttier, Ioana Gavra
Abstract In this paper we propose a modified version of the simulated annealing algorithm for solving a stochastic global optimization problem. More precisely, we address the problem of finding a global minimizer of a function with noisy evaluations. We provide a rate of convergence and its optimized parametrization to ensure a minimal number of evaluations for a given accuracy and a confidence level close to 1. This work is completed with a set of numerical experimentations and assesses the practical performance both on benchmark test cases and on real world examples.
Tasks
Published 2017-03-01
URL http://arxiv.org/abs/1703.00329v1
PDF http://arxiv.org/pdf/1703.00329v1.pdf
PWC https://paperswithcode.com/paper/convergence-rate-of-a-simulated-annealing
Repo
Framework

Weighted Orthogonal Components Regression Analysis

Title Weighted Orthogonal Components Regression Analysis
Authors Xiaogang Su, Yaa Wonkye, Pei Wang, Xiangrong Yin
Abstract In the multiple linear regression setting, we propose a general framework, termed weighted orthogonal components regression (WOCR), which encompasses many known methods as special cases, including ridge regression and principal components regression. WOCR makes use of the monotonicity inherent in orthogonal components to parameterize the weight function. The formulation allows for efficient determination of tuning parameters and hence is computationally advantageous. Moreover, WOCR offers insights for deriving new better variants. Specifically, we advocate weighting components based on their correlations with the response, which leads to enhanced predictive performance. Both simulated studies and real data examples are provided to assess and illustrate the advantages of the proposed methods.
Tasks
Published 2017-09-13
URL http://arxiv.org/abs/1709.04135v2
PDF http://arxiv.org/pdf/1709.04135v2.pdf
PWC https://paperswithcode.com/paper/weighted-orthogonal-components-regression
Repo
Framework

Online Learning with Abstention

Title Online Learning with Abstention
Authors Corinna Cortes, Giulia DeSalvo, Claudio Gentile, Mehryar Mohri, Scott Yang
Abstract We present an extensive study of the key problem of online learning where algorithms are allowed to abstain from making predictions. In the adversarial setting, we show how existing online algorithms and guarantees can be adapted to this problem. In the stochastic setting, we first point out a bias problem that limits the straightforward extension of algorithms such as UCB-N to time-varying feedback graphs, as needed in this context. Next, we give a new algorithm, UCB-GT, that exploits historical data and is adapted to time-varying feedback graphs. We show that this algorithm benefits from more favorable regret guarantees than a possible, but limited, extension of UCB-N. We further report the results of a series of experiments demonstrating that UCB-GT largely outperforms that extension of UCB-N, as well as more standard baselines.
Tasks
Published 2017-03-09
URL https://arxiv.org/abs/1703.03478v3
PDF https://arxiv.org/pdf/1703.03478v3.pdf
PWC https://paperswithcode.com/paper/online-learning-with-abstention
Repo
Framework

Evolving Boxes for Fast Vehicle Detection

Title Evolving Boxes for Fast Vehicle Detection
Authors Li Wang, Yao Lu, Hong Wang, Yingbin Zheng, Hao Ye, Xiangyang Xue
Abstract We perform fast vehicle detection from traffic surveillance cameras. A novel deep learning framework, namely Evolving Boxes, is developed that proposes and refines the object boxes under different feature representations. Specifically, our framework is embedded with a light-weight proposal network to generate initial anchor boxes as well as to early discard unlikely regions; a fine-turning network produces detailed features for these candidate boxes. We show intriguingly that by applying different feature fusion techniques, the initial boxes can be refined for both localization and recognition. We evaluate our network on the recent DETRAC benchmark and obtain a significant improvement over the state-of-the-art Faster RCNN by 9.5% mAP. Further, our network achieves 9-13 FPS detection speed on a moderate commercial GPU.
Tasks Fast Vehicle Detection
Published 2017-02-01
URL http://arxiv.org/abs/1702.00254v3
PDF http://arxiv.org/pdf/1702.00254v3.pdf
PWC https://paperswithcode.com/paper/evolving-boxes-for-fast-vehicle-detection
Repo
Framework

Enabling Smart Data: Noise filtering in Big Data classification

Title Enabling Smart Data: Noise filtering in Big Data classification
Authors Diego García-Gil, Julián Luengo, Salvador García, Francisco Herrera
Abstract In any knowledge discovery process the value of extracted knowledge is directly related to the quality of the data used. Big Data problems, generated by massive growth in the scale of data observed in recent years, also follow the same dictate. A common problem affecting data quality is the presence of noise, particularly in classification problems, where label noise refers to the incorrect labeling of training instances, and is known to be a very disruptive feature of data. However, in this Big Data era, the massive growth in the scale of the data poses a challenge to traditional proposals created to tackle noise, as they have difficulties coping with such a large amount of data. New algorithms need to be proposed to treat the noise in Big Data problems, providing high quality and clean data, also known as Smart Data. In this paper, two Big Data preprocessing approaches to remove noisy examples are proposed: an homogeneous ensemble and an heterogeneous ensemble filter, with special emphasis in their scalability and performance traits. The obtained results show that these proposals enable the practitioner to efficiently obtain a Smart Dataset from any Big Data classification problem.
Tasks
Published 2017-04-06
URL http://arxiv.org/abs/1704.01770v2
PDF http://arxiv.org/pdf/1704.01770v2.pdf
PWC https://paperswithcode.com/paper/enabling-smart-data-noise-filtering-in-big
Repo
Framework

Shape and Positional Geometry of Multi-Object Configurations

Title Shape and Positional Geometry of Multi-Object Configurations
Authors James Damon, Ellen Gasparovic
Abstract In previous work, we introduced a method for modeling a configuration of objects in 2D and 3D images using a mathematical “medial/skeletal linking structure.” In this paper, we show how these structures allow us to capture positional properties of a multi-object configuration in addition to the shape properties of the individual objects. In particular, we introduce numerical invariants for positional properties which measure the closeness of neighboring objects, including identifying the parts of the objects which are close, and the “relative significance” of objects compared with the other objects in the configuration. Using these numerical measures, we introduce a hierarchical ordering and relations between the individual objects, and quantitative criteria for identifying subconfigurations. In addition, the invariants provide a “proximity matrix” which yields a unique set of weightings measuring overall proximity of objects in the configuration. Furthermore, we show that these invariants, which are volumetrically defined and involve external regions, may be computed via integral formulas in terms of “skeletal linking integrals” defined on the internal skeletal structures of the objects.
Tasks
Published 2017-06-01
URL http://arxiv.org/abs/1706.00150v1
PDF http://arxiv.org/pdf/1706.00150v1.pdf
PWC https://paperswithcode.com/paper/shape-and-positional-geometry-of-multi-object
Repo
Framework

Two-pixel polarimetric camera by compressive sensing

Title Two-pixel polarimetric camera by compressive sensing
Authors Julien Fade, Estéban Perrotin, Jérôme Bobin
Abstract We propose an original concept of compressive sensing (CS) polarimetric imaging based on a digital micro-mirror (DMD) array and two single-pixel detectors. The polarimetric sensitivity of the proposed setup is due to an experimental imperfection of reflecting mirrors which is exploited here to form an original reconstruction problem, including a CS problem and a source separation task. We show that a two-step approach tackling each problem successively is outperformed by a dedicated combined reconstruction method, which is explicited in this article and preferably implemented through a reweighted FISTA algorithm. The combined reconstruction approach is then further improved by including physical constraints specific to the polarimetric imaging context considered, which are implemented in an original constrained GFB algorithm. Numerical simulations demonstrate the efficiency of the 2-pixel CS polarimetric imaging setup to retrieve polarimetric contrast data with significant compression rate and good reconstruction quality. The influence of experimental imperfections of the DMD are also analyzed through numerical simulations, and 2D polarimetric imaging reconstruction results are finally presented.
Tasks Compressive Sensing
Published 2017-06-16
URL http://arxiv.org/abs/1707.03705v1
PDF http://arxiv.org/pdf/1707.03705v1.pdf
PWC https://paperswithcode.com/paper/two-pixel-polarimetric-camera-by-compressive
Repo
Framework
comments powered by Disqus