Paper Group AWR 87
Beyond Part Models: Person Retrieval with Refined Part Pooling (and a Strong Convolutional Baseline). Deep Learning with Domain Adaptation for Accelerated Projection-Reconstruction MR. Random Erasing Data Augmentation. Camera Style Adaptation for Person Re-identification. Unlabeled Samples Generated by GAN Improve the Person Re-identification Basel …
Beyond Part Models: Person Retrieval with Refined Part Pooling (and a Strong Convolutional Baseline)
Title | Beyond Part Models: Person Retrieval with Refined Part Pooling (and a Strong Convolutional Baseline) |
Authors | Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, Shengjin Wang |
Abstract | Employing part-level features for pedestrian image description offers fine-grained information and has been verified as beneficial for person retrieval in very recent literature. A prerequisite of part discovery is that each part should be well located. Instead of using external cues, e.g., pose estimation, to directly locate parts, this paper lays emphasis on the content consistency within each part. Specifically, we target at learning discriminative part-informed features for person retrieval and make two contributions. (i) A network named Part-based Convolutional Baseline (PCB). Given an image input, it outputs a convolutional descriptor consisting of several part-level features. With a uniform partition strategy, PCB achieves competitive results with the state-of-the-art methods, proving itself as a strong convolutional baseline for person retrieval. (ii) A refined part pooling (RPP) method. Uniform partition inevitably incurs outliers in each part, which are in fact more similar to other parts. RPP re-assigns these outliers to the parts they are closest to, resulting in refined parts with enhanced within-part consistency. Experiment confirms that RPP allows PCB to gain another round of performance boost. For instance, on the Market-1501 dataset, we achieve (77.4+4.2)% mAP and (92.3+1.5)% rank-1 accuracy, surpassing the state of the art by a large margin. |
Tasks | Person Re-Identification, Person Retrieval |
Published | 2017-11-26 |
URL | http://arxiv.org/abs/1711.09349v3 |
http://arxiv.org/pdf/1711.09349v3.pdf | |
PWC | https://paperswithcode.com/paper/beyond-part-models-person-retrieval-with |
Repo | https://github.com/NIRVANALAN/reid_baseline |
Framework | pytorch |
Deep Learning with Domain Adaptation for Accelerated Projection-Reconstruction MR
Title | Deep Learning with Domain Adaptation for Accelerated Projection-Reconstruction MR |
Authors | Yo Seob Han, Jaejun Yoo, Jong Chul Ye |
Abstract | Purpose: The radial k-space trajectory is a well-established sampling trajectory used in conjunction with magnetic resonance imaging. However, the radial k-space trajectory requires a large number of radial lines for high-resolution reconstruction. Increasing the number of radial lines causes longer acquisition time, making it more difficult for routine clinical use. On the other hand, if we reduce the number of radial lines, streaking artifact patterns are unavoidable. To solve this problem, we propose a novel deep learning approach with domain adaptation to restore high-resolution MR images from under-sampled k-space data. Methods: The proposed deep network removes the streaking artifacts from the artifact corrupted images. To address the situation given the limited available data, we propose a domain adaptation scheme that employs a pre-trained network using a large number of x-ray computed tomography (CT) or synthesized radial MR datasets, which is then fine-tuned with only a few radial MR datasets. Results: The proposed method outperforms existing compressed sensing algorithms, such as the total variation and PR-FOCUSS methods. In addition, the calculation time is several orders of magnitude faster than the total variation and PR-FOCUSS methods.Moreover, we found that pre-training using CT or MR data from similar organ data is more important than pre-training using data from the same modality for different organ. Conclusion: We demonstrate the possibility of a domain-adaptation when only a limited amount of MR data is available. The proposed method surpasses the existing compressed sensing algorithms in terms of the image quality and computation time. |
Tasks | Computed Tomography (CT), Domain Adaptation |
Published | 2017-03-03 |
URL | http://arxiv.org/abs/1703.01135v2 |
http://arxiv.org/pdf/1703.01135v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-with-domain-adaptation-for |
Repo | https://github.com/jongcye/Domain.Adaptation.AcceleratedMR |
Framework | none |
Random Erasing Data Augmentation
Title | Random Erasing Data Augmentation |
Authors | Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, Yi Yang |
Abstract | In this paper, we introduce Random Erasing, a new data augmentation method for training the convolutional neural network (CNN). In training, Random Erasing randomly selects a rectangle region in an image and erases its pixels with random values. In this process, training images with various levels of occlusion are generated, which reduces the risk of over-fitting and makes the model robust to occlusion. Random Erasing is parameter learning free, easy to implement, and can be integrated with most of the CNN-based recognition models. Albeit simple, Random Erasing is complementary to commonly used data augmentation techniques such as random cropping and flipping, and yields consistent improvement over strong baselines in image classification, object detection and person re-identification. Code is available at: https://github.com/zhunzhong07/Random-Erasing. |
Tasks | Data Augmentation, Image Augmentation, Image Classification, Object Detection, Person Re-Identification |
Published | 2017-08-16 |
URL | http://arxiv.org/abs/1708.04896v2 |
http://arxiv.org/pdf/1708.04896v2.pdf | |
PWC | https://paperswithcode.com/paper/random-erasing-data-augmentation |
Repo | https://github.com/NVlabs/DG-Net |
Framework | pytorch |
Camera Style Adaptation for Person Re-identification
Title | Camera Style Adaptation for Person Re-identification |
Authors | Zhun Zhong, Liang Zheng, Zhedong Zheng, Shaozi Li, Yi Yang |
Abstract | Being a cross-camera retrieval task, person re-identification suffers from image style variations caused by different cameras. The art implicitly addresses this problem by learning a camera-invariant descriptor subspace. In this paper, we explicitly consider this challenge by introducing camera style (CamStyle) adaptation. CamStyle can serve as a data augmentation approach that smooths the camera style disparities. Specifically, with CycleGAN, labeled training images can be style-transferred to each camera, and, along with the original training samples, form the augmented training set. This method, while increasing data diversity against over-fitting, also incurs a considerable level of noise. In the effort to alleviate the impact of noise, the label smooth regularization (LSR) is adopted. The vanilla version of our method (without LSR) performs reasonably well on few-camera systems in which over-fitting often occurs. With LSR, we demonstrate consistent improvement in all systems regardless of the extent of over-fitting. We also report competitive accuracy compared with the state of the art. |
Tasks | Data Augmentation, Person Re-Identification |
Published | 2017-11-28 |
URL | http://arxiv.org/abs/1711.10295v2 |
http://arxiv.org/pdf/1711.10295v2.pdf | |
PWC | https://paperswithcode.com/paper/camera-style-adaptation-for-person-re |
Repo | https://github.com/NIRVANALAN/reid_baseline |
Framework | pytorch |
Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in vitro
Title | Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in vitro |
Authors | Zhedong Zheng, Liang Zheng, Yi Yang |
Abstract | The main contribution of this paper is a simple semi-supervised pipeline that only uses the original training set without collecting extra data. It is challenging in 1) how to obtain more training data only from the training set and 2) how to use the newly generated data. In this work, the generative adversarial network (GAN) is used to generate unlabeled samples. We propose the label smoothing regularization for outliers (LSRO). This method assigns a uniform label distribution to the unlabeled images, which regularizes the supervised model and improves the baseline. We verify the proposed method on a practical problem: person re-identification (re-ID). This task aims to retrieve a query person from other cameras. We adopt the deep convolutional generative adversarial network (DCGAN) for sample generation, and a baseline convolutional neural network (CNN) for representation learning. Experiments show that adding the GAN-generated data effectively improves the discriminative ability of learned CNN embeddings. On three large-scale datasets, Market-1501, CUHK03 and DukeMTMC-reID, we obtain +4.37%, +1.6% and +2.46% improvement in rank-1 precision over the baseline CNN, respectively. We additionally apply the proposed method to fine-grained bird recognition and achieve a +0.6% improvement over a strong baseline. The code is available at https://github.com/layumi/Person-reID_GAN. |
Tasks | Person Re-Identification, Representation Learning |
Published | 2017-01-26 |
URL | http://arxiv.org/abs/1701.07717v5 |
http://arxiv.org/pdf/1701.07717v5.pdf | |
PWC | https://paperswithcode.com/paper/unlabeled-samples-generated-by-gan-improve |
Repo | https://github.com/layumi/Person-reID_GAN |
Framework | tf |
LAREX - A semi-automatic open-source Tool for Layout Analysis and Region Extraction on Early Printed Books
Title | LAREX - A semi-automatic open-source Tool for Layout Analysis and Region Extraction on Early Printed Books |
Authors | Christian Reul, Uwe Springmann, Frank Puppe |
Abstract | A semi-automatic open-source tool for layout analysis on early printed books is presented. LAREX uses a rule based connected components approach which is very fast, easily comprehensible for the user and allows an intuitive manual correction if necessary. The PageXML format is used to support integration into existing OCR workflows. Evaluations showed that LAREX provides an efficient and flexible way to segment pages of early printed books. |
Tasks | Optical Character Recognition |
Published | 2017-01-20 |
URL | http://arxiv.org/abs/1701.07396v1 |
http://arxiv.org/pdf/1701.07396v1.pdf | |
PWC | https://paperswithcode.com/paper/larex-a-semi-automatic-open-source-tool-for |
Repo | https://github.com/chreul/LAREX |
Framework | none |
Case Study of a highly automated Layout Analysis and OCR of an incunabulum: ‘Der Heiligen Leben’ (1488)
Title | Case Study of a highly automated Layout Analysis and OCR of an incunabulum: ‘Der Heiligen Leben’ (1488) |
Authors | Christian Reul, Marco Dittrich, Martin Gruner |
Abstract | This paper provides the first thorough documentation of a high quality digitization process applied to an early printed book from the incunabulum period (1450-1500). The entire OCR related workflow including preprocessing, layout analysis and text recognition is illustrated in detail using the example of ‘Der Heiligen Leben’, printed in Nuremberg in 1488. For each step the required time expenditure was recorded. The character recognition yielded excellent results both on character (97.57%) and word (92.19%) level. Furthermore, a comparison of a highly automated (LAREX) and a manual (Aletheia) method for layout analysis was performed. By considerably automating the segmentation the required human effort was reduced significantly from over 100 hours to less than six hours, resulting in only a slight drop in OCR accuracy. Realistic estimates for the human effort necessary for full text extraction from incunabula can be derived from this study. The printed pages of the complete work together with the OCR result is available online ready to be inspected and downloaded. |
Tasks | Optical Character Recognition |
Published | 2017-01-20 |
URL | http://arxiv.org/abs/1701.07395v1 |
http://arxiv.org/pdf/1701.07395v1.pdf | |
PWC | https://paperswithcode.com/paper/case-study-of-a-highly-automated-layout |
Repo | https://github.com/chreul/LAREX |
Framework | none |
Ensemble Adversarial Training: Attacks and Defenses
Title | Ensemble Adversarial Training: Attacks and Defenses |
Authors | Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, Patrick McDaniel |
Abstract | Adversarial examples are perturbed inputs designed to fool machine learning models. Adversarial training injects such examples into training data to increase robustness. To scale this technique to large datasets, perturbations are crafted using fast single-step methods that maximize a linear approximation of the model’s loss. We show that this form of adversarial training converges to a degenerate global minimum, wherein small curvature artifacts near the data points obfuscate a linear approximation of the loss. The model thus learns to generate weak perturbations, rather than defend against strong ones. As a result, we find that adversarial training remains vulnerable to black-box attacks, where we transfer perturbations computed on undefended models, as well as to a powerful novel single-step attack that escapes the non-smooth vicinity of the input data via a small random step. We further introduce Ensemble Adversarial Training, a technique that augments training data with perturbations transferred from other models. On ImageNet, Ensemble Adversarial Training yields models with strong robustness to black-box attacks. In particular, our most robust model won the first round of the NIPS 2017 competition on Defenses against Adversarial Attacks. |
Tasks | |
Published | 2017-05-19 |
URL | http://arxiv.org/abs/1705.07204v4 |
http://arxiv.org/pdf/1705.07204v4.pdf | |
PWC | https://paperswithcode.com/paper/ensemble-adversarial-training-attacks-and |
Repo | https://github.com/sangxia/nips-2017-adversarial |
Framework | tf |
Human Interaction with Recommendation Systems
Title | Human Interaction with Recommendation Systems |
Authors | Sven Schmit, Carlos Riquelme |
Abstract | Many recommendation algorithms rely on user data to generate recommendations. However, these recommendations also affect the data obtained from future users. This work aims to understand the effects of this dynamic interaction. We propose a simple model where users with heterogeneous preferences arrive over time. Based on this model, we prove that naive estimators, i.e. those which ignore this feedback loop, are not consistent. We show that consistent estimators are efficient in the presence of myopic agents. Our results are validated using extensive simulations. |
Tasks | Recommendation Systems |
Published | 2017-03-01 |
URL | http://arxiv.org/abs/1703.00535v3 |
http://arxiv.org/pdf/1703.00535v3.pdf | |
PWC | https://paperswithcode.com/paper/human-interaction-with-recommendation-systems |
Repo | https://github.com/schmit/human_interaction |
Framework | none |
Depth Adaptive Deep Neural Network for Semantic Segmentation
Title | Depth Adaptive Deep Neural Network for Semantic Segmentation |
Authors | Byeongkeun Kang, Yeejin Lee, Truong Q. Nguyen |
Abstract | In this work, we present the depth-adaptive deep neural network using a depth map for semantic segmentation. Typical deep neural networks receive inputs at the predetermined locations regardless of the distance from the camera. This fixed receptive field presents a challenge to generalize the features of objects at various distances in neural networks. Specifically, the predetermined receptive fields are too small at a short distance, and vice versa. To overcome this challenge, we develop a neural network which is able to adapt the receptive field not only for each layer but also for each neuron at the spatial location. To adjust the receptive field, we propose the depth-adaptive multiscale (DaM) convolution layer consisting of the adaptive perception neuron and the in-layer multiscale neuron. The adaptive perception neuron is to adjust the receptive field at each spatial location using the corresponding depth information. The in-layer multiscale neuron is to apply the different size of the receptive field at each feature space to learn features at multiple scales. The proposed DaM convolution is applied to two fully convolutional neural networks. We demonstrate the effectiveness of the proposed neural networks on the publicly available RGB-D dataset for semantic segmentation and the novel hand segmentation dataset for hand-object interaction. The experimental results show that the proposed method outperforms the state-of-the-art methods without any additional layers or pre/post-processing. |
Tasks | Hand Segmentation, Semantic Segmentation |
Published | 2017-08-05 |
URL | http://arxiv.org/abs/1708.01818v2 |
http://arxiv.org/pdf/1708.01818v2.pdf | |
PWC | https://paperswithcode.com/paper/depth-adaptive-deep-neural-network-for |
Repo | https://github.com/byeongkeun-kang/HOI-dataset |
Framework | none |
Amulet: Aggregating Multi-level Convolutional Features for Salient Object Detection
Title | Amulet: Aggregating Multi-level Convolutional Features for Salient Object Detection |
Authors | Pingping Zhang, Dong Wang, Huchuan Lu, Hongyu Wang, Xiang Ruan |
Abstract | Fully convolutional neural networks (FCNs) have shown outstanding performance in many dense labeling problems. One key pillar of these successes is mining relevant information from features in convolutional layers. However, how to better aggregate multi-level convolutional feature maps for salient object detection is underexplored. In this work, we present Amulet, a generic aggregating multi-level convolutional feature framework for salient object detection. Our framework first integrates multi-level feature maps into multiple resolutions, which simultaneously incorporate coarse semantics and fine details. Then it adaptively learns to combine these feature maps at each resolution and predict saliency maps with the combined features. Finally, the predicted results are efficiently fused to generate the final saliency map. In addition, to achieve accurate boundary inference and semantic enhancement, edge-aware feature maps in low-level layers and the predicted results of low resolution features are recursively embedded into the learning framework. By aggregating multi-level convolutional features in this efficient and flexible manner, the proposed saliency model provides accurate salient object labeling. Comprehensive experiments demonstrate that our method performs favorably against state-of-the art approaches in terms of near all compared evaluation metrics. |
Tasks | Object Detection, Salient Object Detection |
Published | 2017-08-07 |
URL | http://arxiv.org/abs/1708.02001v1 |
http://arxiv.org/pdf/1708.02001v1.pdf | |
PWC | https://paperswithcode.com/paper/amulet-aggregating-multi-level-convolutional |
Repo | https://github.com/Pchank/caffe-sal |
Framework | none |
Minimizing Polarization and Disagreement in Social Networks
Title | Minimizing Polarization and Disagreement in Social Networks |
Authors | Cameron Musco, Christopher Musco, Charalampos E. Tsourakakis |
Abstract | The rise of social media and online social networks has been a disruptive force in society. Opinions are increasingly shaped by interactions on online social media, and social phenomena including disagreement and polarization are now tightly woven into everyday life. In this work we initiate the study of the following question: given $n$ agents, each with its own initial opinion that reflects its core value on a topic, and an opinion dynamics model, what is the structure of a social network that minimizes {\em polarization} and {\em disagreement} simultaneously? This question is central to recommender systems: should a recommender system prefer a link suggestion between two online users with similar mindsets in order to keep disagreement low, or between two users with different opinions in order to expose each to the other’s viewpoint of the world, and decrease overall levels of polarization? Our contributions include a mathematical formalization of this question as an optimization problem and an exact, time-efficient algorithm. We also prove that there always exists a network with $O(n/\epsilon^2)$ edges that is a $(1+\epsilon)$ approximation to the optimum. For a fixed graph, we additionally show how to optimize our objective function over the agents’ innate opinions in polynomial time. We perform an empirical study of our proposed methods on synthetic and real-world data that verify their value as mining tools to better understand the trade-off between of disagreement and polarization. We find that there is a lot of space to reduce both polarization and disagreement in real-world networks; for instance, on a Reddit network where users exchange comments on politics, our methods achieve a $\sim 60,000$-fold reduction in polarization and disagreement. |
Tasks | Recommendation Systems |
Published | 2017-12-28 |
URL | http://arxiv.org/abs/1712.09948v1 |
http://arxiv.org/pdf/1712.09948v1.pdf | |
PWC | https://paperswithcode.com/paper/minimizing-polarization-and-disagreement-in |
Repo | https://github.com/tsourolampis/polarization-disagreement |
Framework | none |
Sequence Modeling via Segmentations
Title | Sequence Modeling via Segmentations |
Authors | Chong Wang, Yining Wang, Po-Sen Huang, Abdelrahman Mohamed, Dengyong Zhou, Li Deng |
Abstract | Segmental structure is a common pattern in many types of sequences such as phrases in human languages. In this paper, we present a probabilistic model for sequences via their segmentations. The probability of a segmented sequence is calculated as the product of the probabilities of all its segments, where each segment is modeled using existing tools such as recurrent neural networks. Since the segmentation of a sequence is usually unknown in advance, we sum over all valid segmentations to obtain the final probability for the sequence. An efficient dynamic programming algorithm is developed for forward and backward computations without resorting to any approximation. We demonstrate our approach on text segmentation and speech recognition tasks. In addition to quantitative results, we also show that our approach can discover meaningful segments in their respective application contexts. |
Tasks | Speech Recognition |
Published | 2017-02-24 |
URL | http://arxiv.org/abs/1702.07463v7 |
http://arxiv.org/pdf/1702.07463v7.pdf | |
PWC | https://paperswithcode.com/paper/sequence-modeling-via-segmentations |
Repo | https://github.com/Microsoft/NPMT |
Framework | torch |
Rgtsvm: Support Vector Machines on a GPU in R
Title | Rgtsvm: Support Vector Machines on a GPU in R |
Authors | Zhong Wang, Tinyi Chu, Lauren A Choate, Charles G Danko |
Abstract | Rgtsvm provides a fast and flexible support vector machine (SVM) implementation for the R language. The distinguishing feature of Rgtsvm is that support vector classification and support vector regression tasks are implemented on a graphical processing unit (GPU), allowing the libraries to scale to millions of examples with >100-fold improvement in performance over existing implementations. Nevertheless, Rgtsvm retains feature parity and has an interface that is compatible with the popular e1071 SVM package in R. Altogether, Rgtsvm enables large SVM models to be created by both experienced and novice practitioners. |
Tasks | |
Published | 2017-06-17 |
URL | http://arxiv.org/abs/1706.05544v1 |
http://arxiv.org/pdf/1706.05544v1.pdf | |
PWC | https://paperswithcode.com/paper/rgtsvm-support-vector-machines-on-a-gpu-in-r |
Repo | https://github.com/Danko-Lab/Rgtsvm |
Framework | none |
Calibrating Energy-based Generative Adversarial Networks
Title | Calibrating Energy-based Generative Adversarial Networks |
Authors | Zihang Dai, Amjad Almahairi, Philip Bachman, Eduard Hovy, Aaron Courville |
Abstract | In this paper, we propose to equip Generative Adversarial Networks with the ability to produce direct energy estimates for samples.Specifically, we propose a flexible adversarial training framework, and prove this framework not only ensures the generator converges to the true data distribution, but also enables the discriminator to retain the density information at the global optimal. We derive the analytic form of the induced solution, and analyze the properties. In order to make the proposed framework trainable in practice, we introduce two effective approximation techniques. Empirically, the experiment results closely match our theoretical analysis, verifying the discriminator is able to recover the energy of data distribution. |
Tasks | Image Generation |
Published | 2017-02-06 |
URL | http://arxiv.org/abs/1702.01691v2 |
http://arxiv.org/pdf/1702.01691v2.pdf | |
PWC | https://paperswithcode.com/paper/calibrating-energy-based-generative |
Repo | https://github.com/zihangdai/cegan_iclr2017 |
Framework | none |