July 26, 2019

3029 words 15 mins read

Paper Group ANR 756

Deep Style Match for Complementary Recommendation. Towards Automatic Abdominal Multi-Organ Segmentation in Dual Energy CT using Cascaded 3D Fully Convolutional Network. On the Limitations of First-Order Approximation in GAN Dynamics. Learning Robust Object Recognition Using Composed Scenes from Generative Models. Deep Reinforcement Learning-based I …

Deep Style Match for Complementary Recommendation


Title	Deep Style Match for Complementary Recommendation
Authors	Kui Zhao, Xia Hu, Jiajun Bu, Can Wang
Abstract	Humans develop a common sense of style compatibility between items based on their attributes. We seek to automatically answer questions like “Does this shirt go well with that pair of jeans?” In order to answer these kinds of questions, we attempt to model human sense of style compatibility in this paper. The basic assumption of our approach is that most of the important attributes for a product in an online store are included in its title description. Therefore it is feasible to learn style compatibility from these descriptions. We design a Siamese Convolutional Neural Network architecture and feed it with title pairs of items, which are either compatible or incompatible. Those pairs will be mapped from the original space of symbolic words into some embedded style space. Our approach takes only words as the input with few preprocessing and there is no laborious and expensive feature engineering.
Tasks	Common Sense Reasoning, Feature Engineering
Published	2017-08-26
URL	http://arxiv.org/abs/1708.07938v1
PDF	http://arxiv.org/pdf/1708.07938v1.pdf
PWC	https://paperswithcode.com/paper/deep-style-match-for-complementary
Repo
Framework

Towards Automatic Abdominal Multi-Organ Segmentation in Dual Energy CT using Cascaded 3D Fully Convolutional Network


Title	Towards Automatic Abdominal Multi-Organ Segmentation in Dual Energy CT using Cascaded 3D Fully Convolutional Network
Authors	Shuqing Chen, Holger Roth, Sabrina Dorn, Matthias May, Alexander Cavallaro, Michael M. Lell, Marc Kachelrieß, Hirohisa Oda, Kensaku Mori, Andreas Maier
Abstract	Automatic multi-organ segmentation of the dual energy computed tomography (DECT) data can be beneficial for biomedical research and clinical applications. However, it is a challenging task. Recent advances in deep learning showed the feasibility to use 3-D fully convolutional networks (FCN) for voxel-wise dense predictions in single energy computed tomography (SECT). In this paper, we proposed a 3D FCN based method for automatic multi-organ segmentation in DECT. The work was based on a cascaded FCN and a general model for the major organs trained on a large set of SECT data. We preprocessed the DECT data by using linear weighting and fine-tuned the model for the DECT data. The method was evaluated using 42 torso DECT data acquired with a clinical dual-source CT system. Four abdominal organs (liver, spleen, left and right kidneys) were evaluated. Cross-validation was tested. Effect of the weight on the accuracy was researched. In all the tests, we achieved an average Dice coefficient of 93% for the liver, 90% for the spleen, 91% for the right kidney and 89% for the left kidney, respectively. The results show our method is feasible and promising.
Tasks
Published	2017-10-15
URL	http://arxiv.org/abs/1710.05379v1
PDF	http://arxiv.org/pdf/1710.05379v1.pdf
PWC	https://paperswithcode.com/paper/towards-automatic-abdominal-multi-organ
Repo
Framework

On the Limitations of First-Order Approximation in GAN Dynamics


Title	On the Limitations of First-Order Approximation in GAN Dynamics
Authors	Jerry Li, Aleksander Madry, John Peebles, Ludwig Schmidt
Abstract	While Generative Adversarial Networks (GANs) have demonstrated promising performance on multiple vision tasks, their learning dynamics are not yet well understood, both in theory and in practice. To address this issue, we study GAN dynamics in a simple yet rich parametric model that exhibits several of the common problematic convergence behaviors such as vanishing gradients, mode collapse, and diverging or oscillatory behavior. In spite of the non-convex nature of our model, we are able to perform a rigorous theoretical analysis of its convergence behavior. Our analysis reveals an interesting dichotomy: a GAN with an optimal discriminator provably converges, while first order approximations of the discriminator steps lead to unstable GAN dynamics and mode collapse. Our result suggests that using first order discriminator steps (the de-facto standard in most existing GAN setups) might be one of the factors that makes GAN training challenging in practice.
Tasks
Published	2017-06-29
URL	http://arxiv.org/abs/1706.09884v2
PDF	http://arxiv.org/pdf/1706.09884v2.pdf
PWC	https://paperswithcode.com/paper/on-the-limitations-of-first-order-1
Repo
Framework

Learning Robust Object Recognition Using Composed Scenes from Generative Models


Title	Learning Robust Object Recognition Using Composed Scenes from Generative Models
Authors	Hao Wang, Xingyu Lin, Yimeng Zhang, Tai Sing Lee
Abstract	Recurrent feedback connections in the mammalian visual system have been hypothesized to play a role in synthesizing input in the theoretical framework of analysis by synthesis. The comparison of internally synthesized representation with that of the input provides a validation mechanism during perceptual inference and learning. Inspired by these ideas, we proposed that the synthesis machinery can compose new, unobserved images by imagination to train the network itself so as to increase the robustness of the system in novel scenarios. As a proof of concept, we investigated whether images composed by imagination could help an object recognition system to deal with occlusion, which is challenging for the current state-of-the-art deep convolutional neural networks. We fine-tuned a network on images containing objects in various occlusion scenarios, that are imagined or self-generated through a deep generator network. Trained on imagined occluded scenarios under the object persistence constraint, our network discovered more subtle and localized image features that were neglected by the original network for object classification, obtaining better separability of different object classes in the feature space. This leads to significant improvement of object recognition under occlusion for our network relative to the original network trained only on un-occluded images. In addition to providing practical benefits in object recognition under occlusion, this work demonstrates the use of self-generated composition of visual scenes through the synthesis loop, combined with the object persistence constraint, can provide opportunities for neural networks to discover new relevant patterns in the data, and become more flexible in dealing with novel situations.
Tasks	Object Classification, Object Recognition
Published	2017-05-22
URL	http://arxiv.org/abs/1705.07594v1
PDF	http://arxiv.org/pdf/1705.07594v1.pdf
PWC	https://paperswithcode.com/paper/learning-robust-object-recognition-using
Repo
Framework

Deep Reinforcement Learning-based Image Captioning with Embedding Reward


Title	Deep Reinforcement Learning-based Image Captioning with Embedding Reward
Authors	Zhou Ren, Xiaoyu Wang, Ning Zhang, Xutao Lv, Li-Jia Li
Abstract	Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance of this task. Most state-of-the-art approaches follow an encoder-decoder framework, which generates captions using a sequential recurrent prediction model. However, in this paper, we introduce a novel decision-making framework for image captioning. We utilize a “policy network” and a “value network” to collaboratively generate captions. The policy network serves as a local guidance by providing the confidence of predicting the next word according to the current state. Additionally, the value network serves as a global and lookahead guidance by evaluating all possible extensions of the current state. In essence, it adjusts the goal of predicting the correct words towards the goal of generating captions similar to the ground truth captions. We train both networks using an actor-critic reinforcement learning model, with a novel reward defined by visual-semantic embedding. Extensive experiments and analyses on the Microsoft COCO dataset show that the proposed framework outperforms state-of-the-art approaches across different evaluation metrics.
Tasks	Decision Making, Image Captioning
Published	2017-04-12
URL	http://arxiv.org/abs/1704.03899v1
PDF	http://arxiv.org/pdf/1704.03899v1.pdf
PWC	https://paperswithcode.com/paper/deep-reinforcement-learning-based-image
Repo
Framework

Estimating Historical Hourly Traffic Volumes via Machine Learning and Vehicle Probe Data: A Maryland Case Study


Title	Estimating Historical Hourly Traffic Volumes via Machine Learning and Vehicle Probe Data: A Maryland Case Study
Authors	Przemysław Sekuła, Nikola Marković, Zachary Vander Laan, Kaveh Farokhi Sadabadi
Abstract	This paper focuses on the problem of estimating historical traffic volumes between sparsely-located traffic sensors, which transportation agencies need to accurately compute statewide performance measures. To this end, the paper examines applications of vehicle probe data, automatic traffic recorder counts, and neural network models to estimate hourly volumes in the Maryland highway network, and proposes a novel approach that combines neural networks with an existing profiling method. On average, the proposed approach yields 24% more accurate estimates than volume profiles, which are currently used by transportation agencies across the US to compute statewide performance measures. The paper also quantifies the value of using vehicle probe data in estimating hourly traffic volumes, which provides important managerial insights to transportation agencies interested in acquiring this type of data. For example, results show that volumes can be estimated with a mean absolute percent error of about 21% at locations where average number of observed probes is between 30 and 47 vehicles/hr, which provides a useful guideline for assessing the value of probe vehicle data from different vendors.
Tasks
Published	2017-11-02
URL	http://arxiv.org/abs/1711.00721v2
PDF	http://arxiv.org/pdf/1711.00721v2.pdf
PWC	https://paperswithcode.com/paper/estimating-historical-hourly-traffic-volumes
Repo
Framework

L2 Regularization versus Batch and Weight Normalization


Title	L2 Regularization versus Batch and Weight Normalization
Authors	Twan van Laarhoven
Abstract	Batch Normalization is a commonly used trick to improve the training of deep neural networks. These neural networks use L2 regularization, also called weight decay, ostensibly to prevent overfitting. However, we show that L2 regularization has no regularizing effect when combined with normalization. Instead, regularization has an influence on the scale of weights, and thereby on the effective learning rate. We investigate this dependence, both in theory, and experimentally. We show that popular optimization methods such as ADAM only partially eliminate the influence of normalization on the learning rate. This leads to a discussion on other ways to mitigate this issue.
Tasks	L2 Regularization
Published	2017-06-16
URL	http://arxiv.org/abs/1706.05350v1
PDF	http://arxiv.org/pdf/1706.05350v1.pdf
PWC	https://paperswithcode.com/paper/l2-regularization-versus-batch-and-weight
Repo
Framework

Progressive Joint Modeling in Unsupervised Single-channel Overlapped Speech Recognition


Title	Progressive Joint Modeling in Unsupervised Single-channel Overlapped Speech Recognition
Authors	Zhehuai Chen, Jasha Droppo, Jinyu Li, Wayne Xiong
Abstract	Unsupervised single-channel overlapped speech recognition is one of the hardest problems in automatic speech recognition (ASR). Permutation invariant training (PIT) is a state of the art model-based approach, which applies a single neural network to solve this single-input, multiple-output modeling problem. We propose to advance the current state of the art by imposing a modular structure on the neural network, applying a progressive pretraining regimen, and improving the objective function with transfer learning and a discriminative training criterion. The modular structure splits the problem into three sub-tasks: frame-wise interpreting, utterance-level speaker tracing, and speech recognition. The pretraining regimen uses these modules to solve progressively harder tasks. Transfer learning leverages parallel clean speech to improve the training targets for the network. Our discriminative training formulation is a modification of standard formulations, that also penalizes competing outputs of the system. Experiments are conducted on the artificial overlapped Switchboard and hub5e-swb dataset. The proposed framework achieves over 30% relative improvement of WER over both a strong jointly trained system, PIT for ASR, and a separately optimized system, PIT for speech separation with clean speech ASR model. The improvement comes from better model generalization, training efficiency and the sequence level linguistic knowledge integration.
Tasks	Speech Recognition, Speech Separation, Transfer Learning
Published	2017-07-21
URL	http://arxiv.org/abs/1707.07048v2
PDF	http://arxiv.org/pdf/1707.07048v2.pdf
PWC	https://paperswithcode.com/paper/progressive-joint-modeling-in-unsupervised
Repo
Framework

Convergence rate of a simulated annealing algorithm with noisy observations


Title	Convergence rate of a simulated annealing algorithm with noisy observations
Authors	Clément Bouttier, Ioana Gavra
Abstract	In this paper we propose a modified version of the simulated annealing algorithm for solving a stochastic global optimization problem. More precisely, we address the problem of finding a global minimizer of a function with noisy evaluations. We provide a rate of convergence and its optimized parametrization to ensure a minimal number of evaluations for a given accuracy and a confidence level close to 1. This work is completed with a set of numerical experimentations and assesses the practical performance both on benchmark test cases and on real world examples.
Tasks
Published	2017-03-01
URL	http://arxiv.org/abs/1703.00329v1
PDF	http://arxiv.org/pdf/1703.00329v1.pdf
PWC	https://paperswithcode.com/paper/convergence-rate-of-a-simulated-annealing
Repo
Framework

Weighted Orthogonal Components Regression Analysis


Title	Weighted Orthogonal Components Regression Analysis
Authors	Xiaogang Su, Yaa Wonkye, Pei Wang, Xiangrong Yin
Abstract	In the multiple linear regression setting, we propose a general framework, termed weighted orthogonal components regression (WOCR), which encompasses many known methods as special cases, including ridge regression and principal components regression. WOCR makes use of the monotonicity inherent in orthogonal components to parameterize the weight function. The formulation allows for efficient determination of tuning parameters and hence is computationally advantageous. Moreover, WOCR offers insights for deriving new better variants. Specifically, we advocate weighting components based on their correlations with the response, which leads to enhanced predictive performance. Both simulated studies and real data examples are provided to assess and illustrate the advantages of the proposed methods.
Tasks
Published	2017-09-13
URL	http://arxiv.org/abs/1709.04135v2
PDF	http://arxiv.org/pdf/1709.04135v2.pdf
PWC	https://paperswithcode.com/paper/weighted-orthogonal-components-regression
Repo
Framework

Online Learning with Abstention


Title	Online Learning with Abstention
Authors	Corinna Cortes, Giulia DeSalvo, Claudio Gentile, Mehryar Mohri, Scott Yang
Abstract	We present an extensive study of the key problem of online learning where algorithms are allowed to abstain from making predictions. In the adversarial setting, we show how existing online algorithms and guarantees can be adapted to this problem. In the stochastic setting, we first point out a bias problem that limits the straightforward extension of algorithms such as UCB-N to time-varying feedback graphs, as needed in this context. Next, we give a new algorithm, UCB-GT, that exploits historical data and is adapted to time-varying feedback graphs. We show that this algorithm benefits from more favorable regret guarantees than a possible, but limited, extension of UCB-N. We further report the results of a series of experiments demonstrating that UCB-GT largely outperforms that extension of UCB-N, as well as more standard baselines.
Tasks
Published	2017-03-09
URL	https://arxiv.org/abs/1703.03478v3
PDF	https://arxiv.org/pdf/1703.03478v3.pdf
PWC	https://paperswithcode.com/paper/online-learning-with-abstention
Repo
Framework

Evolving Boxes for Fast Vehicle Detection


Title	Evolving Boxes for Fast Vehicle Detection
Authors	Li Wang, Yao Lu, Hong Wang, Yingbin Zheng, Hao Ye, Xiangyang Xue
Abstract	We perform fast vehicle detection from traffic surveillance cameras. A novel deep learning framework, namely Evolving Boxes, is developed that proposes and refines the object boxes under different feature representations. Specifically, our framework is embedded with a light-weight proposal network to generate initial anchor boxes as well as to early discard unlikely regions; a fine-turning network produces detailed features for these candidate boxes. We show intriguingly that by applying different feature fusion techniques, the initial boxes can be refined for both localization and recognition. We evaluate our network on the recent DETRAC benchmark and obtain a significant improvement over the state-of-the-art Faster RCNN by 9.5% mAP. Further, our network achieves 9-13 FPS detection speed on a moderate commercial GPU.
Tasks	Fast Vehicle Detection
Published	2017-02-01
URL	http://arxiv.org/abs/1702.00254v3
PDF	http://arxiv.org/pdf/1702.00254v3.pdf
PWC	https://paperswithcode.com/paper/evolving-boxes-for-fast-vehicle-detection
Repo
Framework

Enabling Smart Data: Noise filtering in Big Data classification


Title	Enabling Smart Data: Noise filtering in Big Data classification
Authors	Diego García-Gil, Julián Luengo, Salvador García, Francisco Herrera
Abstract	In any knowledge discovery process the value of extracted knowledge is directly related to the quality of the data used. Big Data problems, generated by massive growth in the scale of data observed in recent years, also follow the same dictate. A common problem affecting data quality is the presence of noise, particularly in classification problems, where label noise refers to the incorrect labeling of training instances, and is known to be a very disruptive feature of data. However, in this Big Data era, the massive growth in the scale of the data poses a challenge to traditional proposals created to tackle noise, as they have difficulties coping with such a large amount of data. New algorithms need to be proposed to treat the noise in Big Data problems, providing high quality and clean data, also known as Smart Data. In this paper, two Big Data preprocessing approaches to remove noisy examples are proposed: an homogeneous ensemble and an heterogeneous ensemble filter, with special emphasis in their scalability and performance traits. The obtained results show that these proposals enable the practitioner to efficiently obtain a Smart Dataset from any Big Data classification problem.
Tasks
Published	2017-04-06
URL	http://arxiv.org/abs/1704.01770v2
PDF	http://arxiv.org/pdf/1704.01770v2.pdf
PWC	https://paperswithcode.com/paper/enabling-smart-data-noise-filtering-in-big
Repo
Framework

Shape and Positional Geometry of Multi-Object Configurations


Title	Shape and Positional Geometry of Multi-Object Configurations
Authors	James Damon, Ellen Gasparovic
Abstract	In previous work, we introduced a method for modeling a configuration of objects in 2D and 3D images using a mathematical “medial/skeletal linking structure.” In this paper, we show how these structures allow us to capture positional properties of a multi-object configuration in addition to the shape properties of the individual objects. In particular, we introduce numerical invariants for positional properties which measure the closeness of neighboring objects, including identifying the parts of the objects which are close, and the “relative significance” of objects compared with the other objects in the configuration. Using these numerical measures, we introduce a hierarchical ordering and relations between the individual objects, and quantitative criteria for identifying subconfigurations. In addition, the invariants provide a “proximity matrix” which yields a unique set of weightings measuring overall proximity of objects in the configuration. Furthermore, we show that these invariants, which are volumetrically defined and involve external regions, may be computed via integral formulas in terms of “skeletal linking integrals” defined on the internal skeletal structures of the objects.
Tasks
Published	2017-06-01
URL	http://arxiv.org/abs/1706.00150v1
PDF	http://arxiv.org/pdf/1706.00150v1.pdf
PWC	https://paperswithcode.com/paper/shape-and-positional-geometry-of-multi-object
Repo
Framework

Two-pixel polarimetric camera by compressive sensing


Title	Two-pixel polarimetric camera by compressive sensing
Authors	Julien Fade, Estéban Perrotin, Jérôme Bobin
Abstract	We propose an original concept of compressive sensing (CS) polarimetric imaging based on a digital micro-mirror (DMD) array and two single-pixel detectors. The polarimetric sensitivity of the proposed setup is due to an experimental imperfection of reflecting mirrors which is exploited here to form an original reconstruction problem, including a CS problem and a source separation task. We show that a two-step approach tackling each problem successively is outperformed by a dedicated combined reconstruction method, which is explicited in this article and preferably implemented through a reweighted FISTA algorithm. The combined reconstruction approach is then further improved by including physical constraints specific to the polarimetric imaging context considered, which are implemented in an original constrained GFB algorithm. Numerical simulations demonstrate the efficiency of the 2-pixel CS polarimetric imaging setup to retrieve polarimetric contrast data with significant compression rate and good reconstruction quality. The influence of experimental imperfections of the DMD are also analyzed through numerical simulations, and 2D polarimetric imaging reconstruction results are finally presented.
Tasks	Compressive Sensing
Published	2017-06-16
URL	http://arxiv.org/abs/1707.03705v1
PDF	http://arxiv.org/pdf/1707.03705v1.pdf
PWC	https://paperswithcode.com/paper/two-pixel-polarimetric-camera-by-compressive
Repo
Framework