Paper Group ANR 756
Deep Style Match for Complementary Recommendation. Towards Automatic Abdominal Multi-Organ Segmentation in Dual Energy CT using Cascaded 3D Fully Convolutional Network. On the Limitations of First-Order Approximation in GAN Dynamics. Learning Robust Object Recognition Using Composed Scenes from Generative Models. Deep Reinforcement Learning-based I …
Deep Style Match for Complementary Recommendation
Title | Deep Style Match for Complementary Recommendation |
Authors | Kui Zhao, Xia Hu, Jiajun Bu, Can Wang |
Abstract | Humans develop a common sense of style compatibility between items based on their attributes. We seek to automatically answer questions like “Does this shirt go well with that pair of jeans?” In order to answer these kinds of questions, we attempt to model human sense of style compatibility in this paper. The basic assumption of our approach is that most of the important attributes for a product in an online store are included in its title description. Therefore it is feasible to learn style compatibility from these descriptions. We design a Siamese Convolutional Neural Network architecture and feed it with title pairs of items, which are either compatible or incompatible. Those pairs will be mapped from the original space of symbolic words into some embedded style space. Our approach takes only words as the input with few preprocessing and there is no laborious and expensive feature engineering. |
Tasks | Common Sense Reasoning, Feature Engineering |
Published | 2017-08-26 |
URL | http://arxiv.org/abs/1708.07938v1 |
http://arxiv.org/pdf/1708.07938v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-style-match-for-complementary |
Repo | |
Framework | |
Towards Automatic Abdominal Multi-Organ Segmentation in Dual Energy CT using Cascaded 3D Fully Convolutional Network
Title | Towards Automatic Abdominal Multi-Organ Segmentation in Dual Energy CT using Cascaded 3D Fully Convolutional Network |
Authors | Shuqing Chen, Holger Roth, Sabrina Dorn, Matthias May, Alexander Cavallaro, Michael M. Lell, Marc Kachelrieß, Hirohisa Oda, Kensaku Mori, Andreas Maier |
Abstract | Automatic multi-organ segmentation of the dual energy computed tomography (DECT) data can be beneficial for biomedical research and clinical applications. However, it is a challenging task. Recent advances in deep learning showed the feasibility to use 3-D fully convolutional networks (FCN) for voxel-wise dense predictions in single energy computed tomography (SECT). In this paper, we proposed a 3D FCN based method for automatic multi-organ segmentation in DECT. The work was based on a cascaded FCN and a general model for the major organs trained on a large set of SECT data. We preprocessed the DECT data by using linear weighting and fine-tuned the model for the DECT data. The method was evaluated using 42 torso DECT data acquired with a clinical dual-source CT system. Four abdominal organs (liver, spleen, left and right kidneys) were evaluated. Cross-validation was tested. Effect of the weight on the accuracy was researched. In all the tests, we achieved an average Dice coefficient of 93% for the liver, 90% for the spleen, 91% for the right kidney and 89% for the left kidney, respectively. The results show our method is feasible and promising. |
Tasks | |
Published | 2017-10-15 |
URL | http://arxiv.org/abs/1710.05379v1 |
http://arxiv.org/pdf/1710.05379v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-automatic-abdominal-multi-organ |
Repo | |
Framework | |
On the Limitations of First-Order Approximation in GAN Dynamics
Title | On the Limitations of First-Order Approximation in GAN Dynamics |
Authors | Jerry Li, Aleksander Madry, John Peebles, Ludwig Schmidt |
Abstract | While Generative Adversarial Networks (GANs) have demonstrated promising performance on multiple vision tasks, their learning dynamics are not yet well understood, both in theory and in practice. To address this issue, we study GAN dynamics in a simple yet rich parametric model that exhibits several of the common problematic convergence behaviors such as vanishing gradients, mode collapse, and diverging or oscillatory behavior. In spite of the non-convex nature of our model, we are able to perform a rigorous theoretical analysis of its convergence behavior. Our analysis reveals an interesting dichotomy: a GAN with an optimal discriminator provably converges, while first order approximations of the discriminator steps lead to unstable GAN dynamics and mode collapse. Our result suggests that using first order discriminator steps (the de-facto standard in most existing GAN setups) might be one of the factors that makes GAN training challenging in practice. |
Tasks | |
Published | 2017-06-29 |
URL | http://arxiv.org/abs/1706.09884v2 |
http://arxiv.org/pdf/1706.09884v2.pdf | |
PWC | https://paperswithcode.com/paper/on-the-limitations-of-first-order-1 |
Repo | |
Framework | |
Learning Robust Object Recognition Using Composed Scenes from Generative Models
Title | Learning Robust Object Recognition Using Composed Scenes from Generative Models |
Authors | Hao Wang, Xingyu Lin, Yimeng Zhang, Tai Sing Lee |
Abstract | Recurrent feedback connections in the mammalian visual system have been hypothesized to play a role in synthesizing input in the theoretical framework of analysis by synthesis. The comparison of internally synthesized representation with that of the input provides a validation mechanism during perceptual inference and learning. Inspired by these ideas, we proposed that the synthesis machinery can compose new, unobserved images by imagination to train the network itself so as to increase the robustness of the system in novel scenarios. As a proof of concept, we investigated whether images composed by imagination could help an object recognition system to deal with occlusion, which is challenging for the current state-of-the-art deep convolutional neural networks. We fine-tuned a network on images containing objects in various occlusion scenarios, that are imagined or self-generated through a deep generator network. Trained on imagined occluded scenarios under the object persistence constraint, our network discovered more subtle and localized image features that were neglected by the original network for object classification, obtaining better separability of different object classes in the feature space. This leads to significant improvement of object recognition under occlusion for our network relative to the original network trained only on un-occluded images. In addition to providing practical benefits in object recognition under occlusion, this work demonstrates the use of self-generated composition of visual scenes through the synthesis loop, combined with the object persistence constraint, can provide opportunities for neural networks to discover new relevant patterns in the data, and become more flexible in dealing with novel situations. |
Tasks | Object Classification, Object Recognition |
Published | 2017-05-22 |
URL | http://arxiv.org/abs/1705.07594v1 |
http://arxiv.org/pdf/1705.07594v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-robust-object-recognition-using |
Repo | |
Framework | |
Deep Reinforcement Learning-based Image Captioning with Embedding Reward
Title | Deep Reinforcement Learning-based Image Captioning with Embedding Reward |
Authors | Zhou Ren, Xiaoyu Wang, Ning Zhang, Xutao Lv, Li-Jia Li |
Abstract | Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance of this task. Most state-of-the-art approaches follow an encoder-decoder framework, which generates captions using a sequential recurrent prediction model. However, in this paper, we introduce a novel decision-making framework for image captioning. We utilize a “policy network” and a “value network” to collaboratively generate captions. The policy network serves as a local guidance by providing the confidence of predicting the next word according to the current state. Additionally, the value network serves as a global and lookahead guidance by evaluating all possible extensions of the current state. In essence, it adjusts the goal of predicting the correct words towards the goal of generating captions similar to the ground truth captions. We train both networks using an actor-critic reinforcement learning model, with a novel reward defined by visual-semantic embedding. Extensive experiments and analyses on the Microsoft COCO dataset show that the proposed framework outperforms state-of-the-art approaches across different evaluation metrics. |
Tasks | Decision Making, Image Captioning |
Published | 2017-04-12 |
URL | http://arxiv.org/abs/1704.03899v1 |
http://arxiv.org/pdf/1704.03899v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-reinforcement-learning-based-image |
Repo | |
Framework | |
Estimating Historical Hourly Traffic Volumes via Machine Learning and Vehicle Probe Data: A Maryland Case Study
Title | Estimating Historical Hourly Traffic Volumes via Machine Learning and Vehicle Probe Data: A Maryland Case Study |
Authors | Przemysław Sekuła, Nikola Marković, Zachary Vander Laan, Kaveh Farokhi Sadabadi |
Abstract | This paper focuses on the problem of estimating historical traffic volumes between sparsely-located traffic sensors, which transportation agencies need to accurately compute statewide performance measures. To this end, the paper examines applications of vehicle probe data, automatic traffic recorder counts, and neural network models to estimate hourly volumes in the Maryland highway network, and proposes a novel approach that combines neural networks with an existing profiling method. On average, the proposed approach yields 24% more accurate estimates than volume profiles, which are currently used by transportation agencies across the US to compute statewide performance measures. The paper also quantifies the value of using vehicle probe data in estimating hourly traffic volumes, which provides important managerial insights to transportation agencies interested in acquiring this type of data. For example, results show that volumes can be estimated with a mean absolute percent error of about 21% at locations where average number of observed probes is between 30 and 47 vehicles/hr, which provides a useful guideline for assessing the value of probe vehicle data from different vendors. |
Tasks | |
Published | 2017-11-02 |
URL | http://arxiv.org/abs/1711.00721v2 |
http://arxiv.org/pdf/1711.00721v2.pdf | |
PWC | https://paperswithcode.com/paper/estimating-historical-hourly-traffic-volumes |
Repo | |
Framework | |
L2 Regularization versus Batch and Weight Normalization
Title | L2 Regularization versus Batch and Weight Normalization |
Authors | Twan van Laarhoven |
Abstract | Batch Normalization is a commonly used trick to improve the training of deep neural networks. These neural networks use L2 regularization, also called weight decay, ostensibly to prevent overfitting. However, we show that L2 regularization has no regularizing effect when combined with normalization. Instead, regularization has an influence on the scale of weights, and thereby on the effective learning rate. We investigate this dependence, both in theory, and experimentally. We show that popular optimization methods such as ADAM only partially eliminate the influence of normalization on the learning rate. This leads to a discussion on other ways to mitigate this issue. |
Tasks | L2 Regularization |
Published | 2017-06-16 |
URL | http://arxiv.org/abs/1706.05350v1 |
http://arxiv.org/pdf/1706.05350v1.pdf | |
PWC | https://paperswithcode.com/paper/l2-regularization-versus-batch-and-weight |
Repo | |
Framework | |
Progressive Joint Modeling in Unsupervised Single-channel Overlapped Speech Recognition
Title | Progressive Joint Modeling in Unsupervised Single-channel Overlapped Speech Recognition |
Authors | Zhehuai Chen, Jasha Droppo, Jinyu Li, Wayne Xiong |
Abstract | Unsupervised single-channel overlapped speech recognition is one of the hardest problems in automatic speech recognition (ASR). Permutation invariant training (PIT) is a state of the art model-based approach, which applies a single neural network to solve this single-input, multiple-output modeling problem. We propose to advance the current state of the art by imposing a modular structure on the neural network, applying a progressive pretraining regimen, and improving the objective function with transfer learning and a discriminative training criterion. The modular structure splits the problem into three sub-tasks: frame-wise interpreting, utterance-level speaker tracing, and speech recognition. The pretraining regimen uses these modules to solve progressively harder tasks. Transfer learning leverages parallel clean speech to improve the training targets for the network. Our discriminative training formulation is a modification of standard formulations, that also penalizes competing outputs of the system. Experiments are conducted on the artificial overlapped Switchboard and hub5e-swb dataset. The proposed framework achieves over 30% relative improvement of WER over both a strong jointly trained system, PIT for ASR, and a separately optimized system, PIT for speech separation with clean speech ASR model. The improvement comes from better model generalization, training efficiency and the sequence level linguistic knowledge integration. |
Tasks | Speech Recognition, Speech Separation, Transfer Learning |
Published | 2017-07-21 |
URL | http://arxiv.org/abs/1707.07048v2 |
http://arxiv.org/pdf/1707.07048v2.pdf | |
PWC | https://paperswithcode.com/paper/progressive-joint-modeling-in-unsupervised |
Repo | |
Framework | |
Convergence rate of a simulated annealing algorithm with noisy observations
Title | Convergence rate of a simulated annealing algorithm with noisy observations |
Authors | Clément Bouttier, Ioana Gavra |
Abstract | In this paper we propose a modified version of the simulated annealing algorithm for solving a stochastic global optimization problem. More precisely, we address the problem of finding a global minimizer of a function with noisy evaluations. We provide a rate of convergence and its optimized parametrization to ensure a minimal number of evaluations for a given accuracy and a confidence level close to 1. This work is completed with a set of numerical experimentations and assesses the practical performance both on benchmark test cases and on real world examples. |
Tasks | |
Published | 2017-03-01 |
URL | http://arxiv.org/abs/1703.00329v1 |
http://arxiv.org/pdf/1703.00329v1.pdf | |
PWC | https://paperswithcode.com/paper/convergence-rate-of-a-simulated-annealing |
Repo | |
Framework | |
Weighted Orthogonal Components Regression Analysis
Title | Weighted Orthogonal Components Regression Analysis |
Authors | Xiaogang Su, Yaa Wonkye, Pei Wang, Xiangrong Yin |
Abstract | In the multiple linear regression setting, we propose a general framework, termed weighted orthogonal components regression (WOCR), which encompasses many known methods as special cases, including ridge regression and principal components regression. WOCR makes use of the monotonicity inherent in orthogonal components to parameterize the weight function. The formulation allows for efficient determination of tuning parameters and hence is computationally advantageous. Moreover, WOCR offers insights for deriving new better variants. Specifically, we advocate weighting components based on their correlations with the response, which leads to enhanced predictive performance. Both simulated studies and real data examples are provided to assess and illustrate the advantages of the proposed methods. |
Tasks | |
Published | 2017-09-13 |
URL | http://arxiv.org/abs/1709.04135v2 |
http://arxiv.org/pdf/1709.04135v2.pdf | |
PWC | https://paperswithcode.com/paper/weighted-orthogonal-components-regression |
Repo | |
Framework | |
Online Learning with Abstention
Title | Online Learning with Abstention |
Authors | Corinna Cortes, Giulia DeSalvo, Claudio Gentile, Mehryar Mohri, Scott Yang |
Abstract | We present an extensive study of the key problem of online learning where algorithms are allowed to abstain from making predictions. In the adversarial setting, we show how existing online algorithms and guarantees can be adapted to this problem. In the stochastic setting, we first point out a bias problem that limits the straightforward extension of algorithms such as UCB-N to time-varying feedback graphs, as needed in this context. Next, we give a new algorithm, UCB-GT, that exploits historical data and is adapted to time-varying feedback graphs. We show that this algorithm benefits from more favorable regret guarantees than a possible, but limited, extension of UCB-N. We further report the results of a series of experiments demonstrating that UCB-GT largely outperforms that extension of UCB-N, as well as more standard baselines. |
Tasks | |
Published | 2017-03-09 |
URL | https://arxiv.org/abs/1703.03478v3 |
https://arxiv.org/pdf/1703.03478v3.pdf | |
PWC | https://paperswithcode.com/paper/online-learning-with-abstention |
Repo | |
Framework | |
Evolving Boxes for Fast Vehicle Detection
Title | Evolving Boxes for Fast Vehicle Detection |
Authors | Li Wang, Yao Lu, Hong Wang, Yingbin Zheng, Hao Ye, Xiangyang Xue |
Abstract | We perform fast vehicle detection from traffic surveillance cameras. A novel deep learning framework, namely Evolving Boxes, is developed that proposes and refines the object boxes under different feature representations. Specifically, our framework is embedded with a light-weight proposal network to generate initial anchor boxes as well as to early discard unlikely regions; a fine-turning network produces detailed features for these candidate boxes. We show intriguingly that by applying different feature fusion techniques, the initial boxes can be refined for both localization and recognition. We evaluate our network on the recent DETRAC benchmark and obtain a significant improvement over the state-of-the-art Faster RCNN by 9.5% mAP. Further, our network achieves 9-13 FPS detection speed on a moderate commercial GPU. |
Tasks | Fast Vehicle Detection |
Published | 2017-02-01 |
URL | http://arxiv.org/abs/1702.00254v3 |
http://arxiv.org/pdf/1702.00254v3.pdf | |
PWC | https://paperswithcode.com/paper/evolving-boxes-for-fast-vehicle-detection |
Repo | |
Framework | |
Enabling Smart Data: Noise filtering in Big Data classification
Title | Enabling Smart Data: Noise filtering in Big Data classification |
Authors | Diego García-Gil, Julián Luengo, Salvador García, Francisco Herrera |
Abstract | In any knowledge discovery process the value of extracted knowledge is directly related to the quality of the data used. Big Data problems, generated by massive growth in the scale of data observed in recent years, also follow the same dictate. A common problem affecting data quality is the presence of noise, particularly in classification problems, where label noise refers to the incorrect labeling of training instances, and is known to be a very disruptive feature of data. However, in this Big Data era, the massive growth in the scale of the data poses a challenge to traditional proposals created to tackle noise, as they have difficulties coping with such a large amount of data. New algorithms need to be proposed to treat the noise in Big Data problems, providing high quality and clean data, also known as Smart Data. In this paper, two Big Data preprocessing approaches to remove noisy examples are proposed: an homogeneous ensemble and an heterogeneous ensemble filter, with special emphasis in their scalability and performance traits. The obtained results show that these proposals enable the practitioner to efficiently obtain a Smart Dataset from any Big Data classification problem. |
Tasks | |
Published | 2017-04-06 |
URL | http://arxiv.org/abs/1704.01770v2 |
http://arxiv.org/pdf/1704.01770v2.pdf | |
PWC | https://paperswithcode.com/paper/enabling-smart-data-noise-filtering-in-big |
Repo | |
Framework | |
Shape and Positional Geometry of Multi-Object Configurations
Title | Shape and Positional Geometry of Multi-Object Configurations |
Authors | James Damon, Ellen Gasparovic |
Abstract | In previous work, we introduced a method for modeling a configuration of objects in 2D and 3D images using a mathematical “medial/skeletal linking structure.” In this paper, we show how these structures allow us to capture positional properties of a multi-object configuration in addition to the shape properties of the individual objects. In particular, we introduce numerical invariants for positional properties which measure the closeness of neighboring objects, including identifying the parts of the objects which are close, and the “relative significance” of objects compared with the other objects in the configuration. Using these numerical measures, we introduce a hierarchical ordering and relations between the individual objects, and quantitative criteria for identifying subconfigurations. In addition, the invariants provide a “proximity matrix” which yields a unique set of weightings measuring overall proximity of objects in the configuration. Furthermore, we show that these invariants, which are volumetrically defined and involve external regions, may be computed via integral formulas in terms of “skeletal linking integrals” defined on the internal skeletal structures of the objects. |
Tasks | |
Published | 2017-06-01 |
URL | http://arxiv.org/abs/1706.00150v1 |
http://arxiv.org/pdf/1706.00150v1.pdf | |
PWC | https://paperswithcode.com/paper/shape-and-positional-geometry-of-multi-object |
Repo | |
Framework | |
Two-pixel polarimetric camera by compressive sensing
Title | Two-pixel polarimetric camera by compressive sensing |
Authors | Julien Fade, Estéban Perrotin, Jérôme Bobin |
Abstract | We propose an original concept of compressive sensing (CS) polarimetric imaging based on a digital micro-mirror (DMD) array and two single-pixel detectors. The polarimetric sensitivity of the proposed setup is due to an experimental imperfection of reflecting mirrors which is exploited here to form an original reconstruction problem, including a CS problem and a source separation task. We show that a two-step approach tackling each problem successively is outperformed by a dedicated combined reconstruction method, which is explicited in this article and preferably implemented through a reweighted FISTA algorithm. The combined reconstruction approach is then further improved by including physical constraints specific to the polarimetric imaging context considered, which are implemented in an original constrained GFB algorithm. Numerical simulations demonstrate the efficiency of the 2-pixel CS polarimetric imaging setup to retrieve polarimetric contrast data with significant compression rate and good reconstruction quality. The influence of experimental imperfections of the DMD are also analyzed through numerical simulations, and 2D polarimetric imaging reconstruction results are finally presented. |
Tasks | Compressive Sensing |
Published | 2017-06-16 |
URL | http://arxiv.org/abs/1707.03705v1 |
http://arxiv.org/pdf/1707.03705v1.pdf | |
PWC | https://paperswithcode.com/paper/two-pixel-polarimetric-camera-by-compressive |
Repo | |
Framework | |