July 29, 2019

3153 words 15 mins read

Paper Group AWR 87

Beyond Part Models: Person Retrieval with Refined Part Pooling (and a Strong Convolutional Baseline). Deep Learning with Domain Adaptation for Accelerated Projection-Reconstruction MR. Random Erasing Data Augmentation. Camera Style Adaptation for Person Re-identification. Unlabeled Samples Generated by GAN Improve the Person Re-identification Basel …

Beyond Part Models: Person Retrieval with Refined Part Pooling (and a Strong Convolutional Baseline)


Title	Beyond Part Models: Person Retrieval with Refined Part Pooling (and a Strong Convolutional Baseline)
Authors	Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, Shengjin Wang
Abstract	Employing part-level features for pedestrian image description offers fine-grained information and has been verified as beneficial for person retrieval in very recent literature. A prerequisite of part discovery is that each part should be well located. Instead of using external cues, e.g., pose estimation, to directly locate parts, this paper lays emphasis on the content consistency within each part. Specifically, we target at learning discriminative part-informed features for person retrieval and make two contributions. (i) A network named Part-based Convolutional Baseline (PCB). Given an image input, it outputs a convolutional descriptor consisting of several part-level features. With a uniform partition strategy, PCB achieves competitive results with the state-of-the-art methods, proving itself as a strong convolutional baseline for person retrieval. (ii) A refined part pooling (RPP) method. Uniform partition inevitably incurs outliers in each part, which are in fact more similar to other parts. RPP re-assigns these outliers to the parts they are closest to, resulting in refined parts with enhanced within-part consistency. Experiment confirms that RPP allows PCB to gain another round of performance boost. For instance, on the Market-1501 dataset, we achieve (77.4+4.2)% mAP and (92.3+1.5)% rank-1 accuracy, surpassing the state of the art by a large margin.
Tasks	Person Re-Identification, Person Retrieval
Published	2017-11-26
URL	http://arxiv.org/abs/1711.09349v3
PDF	http://arxiv.org/pdf/1711.09349v3.pdf
PWC	https://paperswithcode.com/paper/beyond-part-models-person-retrieval-with
Repo	https://github.com/NIRVANALAN/reid_baseline
Framework	pytorch

Deep Learning with Domain Adaptation for Accelerated Projection-Reconstruction MR


Title	Deep Learning with Domain Adaptation for Accelerated Projection-Reconstruction MR
Authors	Yo Seob Han, Jaejun Yoo, Jong Chul Ye
Abstract	Purpose: The radial k-space trajectory is a well-established sampling trajectory used in conjunction with magnetic resonance imaging. However, the radial k-space trajectory requires a large number of radial lines for high-resolution reconstruction. Increasing the number of radial lines causes longer acquisition time, making it more difficult for routine clinical use. On the other hand, if we reduce the number of radial lines, streaking artifact patterns are unavoidable. To solve this problem, we propose a novel deep learning approach with domain adaptation to restore high-resolution MR images from under-sampled k-space data. Methods: The proposed deep network removes the streaking artifacts from the artifact corrupted images. To address the situation given the limited available data, we propose a domain adaptation scheme that employs a pre-trained network using a large number of x-ray computed tomography (CT) or synthesized radial MR datasets, which is then fine-tuned with only a few radial MR datasets. Results: The proposed method outperforms existing compressed sensing algorithms, such as the total variation and PR-FOCUSS methods. In addition, the calculation time is several orders of magnitude faster than the total variation and PR-FOCUSS methods.Moreover, we found that pre-training using CT or MR data from similar organ data is more important than pre-training using data from the same modality for different organ. Conclusion: We demonstrate the possibility of a domain-adaptation when only a limited amount of MR data is available. The proposed method surpasses the existing compressed sensing algorithms in terms of the image quality and computation time.
Tasks	Computed Tomography (CT), Domain Adaptation
Published	2017-03-03
URL	http://arxiv.org/abs/1703.01135v2
PDF	http://arxiv.org/pdf/1703.01135v2.pdf
PWC	https://paperswithcode.com/paper/deep-learning-with-domain-adaptation-for
Repo	https://github.com/jongcye/Domain.Adaptation.AcceleratedMR
Framework	none

Random Erasing Data Augmentation


Title	Random Erasing Data Augmentation
Authors	Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, Yi Yang
Abstract	In this paper, we introduce Random Erasing, a new data augmentation method for training the convolutional neural network (CNN). In training, Random Erasing randomly selects a rectangle region in an image and erases its pixels with random values. In this process, training images with various levels of occlusion are generated, which reduces the risk of over-fitting and makes the model robust to occlusion. Random Erasing is parameter learning free, easy to implement, and can be integrated with most of the CNN-based recognition models. Albeit simple, Random Erasing is complementary to commonly used data augmentation techniques such as random cropping and flipping, and yields consistent improvement over strong baselines in image classification, object detection and person re-identification. Code is available at: https://github.com/zhunzhong07/Random-Erasing.
Tasks	Data Augmentation, Image Augmentation, Image Classification, Object Detection, Person Re-Identification
Published	2017-08-16
URL	http://arxiv.org/abs/1708.04896v2
PDF	http://arxiv.org/pdf/1708.04896v2.pdf
PWC	https://paperswithcode.com/paper/random-erasing-data-augmentation
Repo	https://github.com/NVlabs/DG-Net
Framework	pytorch

Camera Style Adaptation for Person Re-identification


Title	Camera Style Adaptation for Person Re-identification
Authors	Zhun Zhong, Liang Zheng, Zhedong Zheng, Shaozi Li, Yi Yang
Abstract	Being a cross-camera retrieval task, person re-identification suffers from image style variations caused by different cameras. The art implicitly addresses this problem by learning a camera-invariant descriptor subspace. In this paper, we explicitly consider this challenge by introducing camera style (CamStyle) adaptation. CamStyle can serve as a data augmentation approach that smooths the camera style disparities. Specifically, with CycleGAN, labeled training images can be style-transferred to each camera, and, along with the original training samples, form the augmented training set. This method, while increasing data diversity against over-fitting, also incurs a considerable level of noise. In the effort to alleviate the impact of noise, the label smooth regularization (LSR) is adopted. The vanilla version of our method (without LSR) performs reasonably well on few-camera systems in which over-fitting often occurs. With LSR, we demonstrate consistent improvement in all systems regardless of the extent of over-fitting. We also report competitive accuracy compared with the state of the art.
Tasks	Data Augmentation, Person Re-Identification
Published	2017-11-28
URL	http://arxiv.org/abs/1711.10295v2
PDF	http://arxiv.org/pdf/1711.10295v2.pdf
PWC	https://paperswithcode.com/paper/camera-style-adaptation-for-person-re
Repo	https://github.com/NIRVANALAN/reid_baseline
Framework	pytorch

Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in vitro


Title	Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in vitro
Authors	Zhedong Zheng, Liang Zheng, Yi Yang
Abstract	The main contribution of this paper is a simple semi-supervised pipeline that only uses the original training set without collecting extra data. It is challenging in 1) how to obtain more training data only from the training set and 2) how to use the newly generated data. In this work, the generative adversarial network (GAN) is used to generate unlabeled samples. We propose the label smoothing regularization for outliers (LSRO). This method assigns a uniform label distribution to the unlabeled images, which regularizes the supervised model and improves the baseline. We verify the proposed method on a practical problem: person re-identification (re-ID). This task aims to retrieve a query person from other cameras. We adopt the deep convolutional generative adversarial network (DCGAN) for sample generation, and a baseline convolutional neural network (CNN) for representation learning. Experiments show that adding the GAN-generated data effectively improves the discriminative ability of learned CNN embeddings. On three large-scale datasets, Market-1501, CUHK03 and DukeMTMC-reID, we obtain +4.37%, +1.6% and +2.46% improvement in rank-1 precision over the baseline CNN, respectively. We additionally apply the proposed method to fine-grained bird recognition and achieve a +0.6% improvement over a strong baseline. The code is available at https://github.com/layumi/Person-reID_GAN.
Tasks	Person Re-Identification, Representation Learning
Published	2017-01-26
URL	http://arxiv.org/abs/1701.07717v5
PDF	http://arxiv.org/pdf/1701.07717v5.pdf
PWC	https://paperswithcode.com/paper/unlabeled-samples-generated-by-gan-improve
Repo	https://github.com/layumi/Person-reID_GAN
Framework	tf

LAREX - A semi-automatic open-source Tool for Layout Analysis and Region Extraction on Early Printed Books


Title	LAREX - A semi-automatic open-source Tool for Layout Analysis and Region Extraction on Early Printed Books
Authors	Christian Reul, Uwe Springmann, Frank Puppe
Abstract	A semi-automatic open-source tool for layout analysis on early printed books is presented. LAREX uses a rule based connected components approach which is very fast, easily comprehensible for the user and allows an intuitive manual correction if necessary. The PageXML format is used to support integration into existing OCR workflows. Evaluations showed that LAREX provides an efficient and flexible way to segment pages of early printed books.
Tasks	Optical Character Recognition
Published	2017-01-20
URL	http://arxiv.org/abs/1701.07396v1
PDF	http://arxiv.org/pdf/1701.07396v1.pdf
PWC	https://paperswithcode.com/paper/larex-a-semi-automatic-open-source-tool-for
Repo	https://github.com/chreul/LAREX
Framework	none

Case Study of a highly automated Layout Analysis and OCR of an incunabulum: ‘Der Heiligen Leben’ (1488)


Title	Case Study of a highly automated Layout Analysis and OCR of an incunabulum: ‘Der Heiligen Leben’ (1488)
Authors	Christian Reul, Marco Dittrich, Martin Gruner
Abstract	This paper provides the first thorough documentation of a high quality digitization process applied to an early printed book from the incunabulum period (1450-1500). The entire OCR related workflow including preprocessing, layout analysis and text recognition is illustrated in detail using the example of ‘Der Heiligen Leben’, printed in Nuremberg in 1488. For each step the required time expenditure was recorded. The character recognition yielded excellent results both on character (97.57%) and word (92.19%) level. Furthermore, a comparison of a highly automated (LAREX) and a manual (Aletheia) method for layout analysis was performed. By considerably automating the segmentation the required human effort was reduced significantly from over 100 hours to less than six hours, resulting in only a slight drop in OCR accuracy. Realistic estimates for the human effort necessary for full text extraction from incunabula can be derived from this study. The printed pages of the complete work together with the OCR result is available online ready to be inspected and downloaded.
Tasks	Optical Character Recognition
Published	2017-01-20
URL	http://arxiv.org/abs/1701.07395v1
PDF	http://arxiv.org/pdf/1701.07395v1.pdf
PWC	https://paperswithcode.com/paper/case-study-of-a-highly-automated-layout
Repo	https://github.com/chreul/LAREX
Framework	none

Ensemble Adversarial Training: Attacks and Defenses


Title	Ensemble Adversarial Training: Attacks and Defenses
Authors	Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, Patrick McDaniel
Abstract	Adversarial examples are perturbed inputs designed to fool machine learning models. Adversarial training injects such examples into training data to increase robustness. To scale this technique to large datasets, perturbations are crafted using fast single-step methods that maximize a linear approximation of the model’s loss. We show that this form of adversarial training converges to a degenerate global minimum, wherein small curvature artifacts near the data points obfuscate a linear approximation of the loss. The model thus learns to generate weak perturbations, rather than defend against strong ones. As a result, we find that adversarial training remains vulnerable to black-box attacks, where we transfer perturbations computed on undefended models, as well as to a powerful novel single-step attack that escapes the non-smooth vicinity of the input data via a small random step. We further introduce Ensemble Adversarial Training, a technique that augments training data with perturbations transferred from other models. On ImageNet, Ensemble Adversarial Training yields models with strong robustness to black-box attacks. In particular, our most robust model won the first round of the NIPS 2017 competition on Defenses against Adversarial Attacks.
Tasks
Published	2017-05-19
URL	http://arxiv.org/abs/1705.07204v4
PDF	http://arxiv.org/pdf/1705.07204v4.pdf
PWC	https://paperswithcode.com/paper/ensemble-adversarial-training-attacks-and
Repo	https://github.com/sangxia/nips-2017-adversarial
Framework	tf

Human Interaction with Recommendation Systems


Title	Human Interaction with Recommendation Systems
Authors	Sven Schmit, Carlos Riquelme
Abstract	Many recommendation algorithms rely on user data to generate recommendations. However, these recommendations also affect the data obtained from future users. This work aims to understand the effects of this dynamic interaction. We propose a simple model where users with heterogeneous preferences arrive over time. Based on this model, we prove that naive estimators, i.e. those which ignore this feedback loop, are not consistent. We show that consistent estimators are efficient in the presence of myopic agents. Our results are validated using extensive simulations.
Tasks	Recommendation Systems
Published	2017-03-01
URL	http://arxiv.org/abs/1703.00535v3
PDF	http://arxiv.org/pdf/1703.00535v3.pdf
PWC	https://paperswithcode.com/paper/human-interaction-with-recommendation-systems
Repo	https://github.com/schmit/human_interaction
Framework	none

Depth Adaptive Deep Neural Network for Semantic Segmentation


Title	Depth Adaptive Deep Neural Network for Semantic Segmentation
Authors	Byeongkeun Kang, Yeejin Lee, Truong Q. Nguyen
Abstract	In this work, we present the depth-adaptive deep neural network using a depth map for semantic segmentation. Typical deep neural networks receive inputs at the predetermined locations regardless of the distance from the camera. This fixed receptive field presents a challenge to generalize the features of objects at various distances in neural networks. Specifically, the predetermined receptive fields are too small at a short distance, and vice versa. To overcome this challenge, we develop a neural network which is able to adapt the receptive field not only for each layer but also for each neuron at the spatial location. To adjust the receptive field, we propose the depth-adaptive multiscale (DaM) convolution layer consisting of the adaptive perception neuron and the in-layer multiscale neuron. The adaptive perception neuron is to adjust the receptive field at each spatial location using the corresponding depth information. The in-layer multiscale neuron is to apply the different size of the receptive field at each feature space to learn features at multiple scales. The proposed DaM convolution is applied to two fully convolutional neural networks. We demonstrate the effectiveness of the proposed neural networks on the publicly available RGB-D dataset for semantic segmentation and the novel hand segmentation dataset for hand-object interaction. The experimental results show that the proposed method outperforms the state-of-the-art methods without any additional layers or pre/post-processing.
Tasks	Hand Segmentation, Semantic Segmentation
Published	2017-08-05
URL	http://arxiv.org/abs/1708.01818v2
PDF	http://arxiv.org/pdf/1708.01818v2.pdf
PWC	https://paperswithcode.com/paper/depth-adaptive-deep-neural-network-for
Repo	https://github.com/byeongkeun-kang/HOI-dataset
Framework	none

Amulet: Aggregating Multi-level Convolutional Features for Salient Object Detection


Title	Amulet: Aggregating Multi-level Convolutional Features for Salient Object Detection
Authors	Pingping Zhang, Dong Wang, Huchuan Lu, Hongyu Wang, Xiang Ruan
Abstract	Fully convolutional neural networks (FCNs) have shown outstanding performance in many dense labeling problems. One key pillar of these successes is mining relevant information from features in convolutional layers. However, how to better aggregate multi-level convolutional feature maps for salient object detection is underexplored. In this work, we present Amulet, a generic aggregating multi-level convolutional feature framework for salient object detection. Our framework first integrates multi-level feature maps into multiple resolutions, which simultaneously incorporate coarse semantics and fine details. Then it adaptively learns to combine these feature maps at each resolution and predict saliency maps with the combined features. Finally, the predicted results are efficiently fused to generate the final saliency map. In addition, to achieve accurate boundary inference and semantic enhancement, edge-aware feature maps in low-level layers and the predicted results of low resolution features are recursively embedded into the learning framework. By aggregating multi-level convolutional features in this efficient and flexible manner, the proposed saliency model provides accurate salient object labeling. Comprehensive experiments demonstrate that our method performs favorably against state-of-the art approaches in terms of near all compared evaluation metrics.
Tasks	Object Detection, Salient Object Detection
Published	2017-08-07
URL	http://arxiv.org/abs/1708.02001v1
PDF	http://arxiv.org/pdf/1708.02001v1.pdf
PWC	https://paperswithcode.com/paper/amulet-aggregating-multi-level-convolutional
Repo	https://github.com/Pchank/caffe-sal
Framework	none


Title	Minimizing Polarization and Disagreement in Social Networks
Authors	Cameron Musco, Christopher Musco, Charalampos E. Tsourakakis
Abstract	The rise of social media and online social networks has been a disruptive force in society. Opinions are increasingly shaped by interactions on online social media, and social phenomena including disagreement and polarization are now tightly woven into everyday life. In this work we initiate the study of the following question: given $n$ agents, each with its own initial opinion that reflects its core value on a topic, and an opinion dynamics model, what is the structure of a social network that minimizes {\em polarization} and {\em disagreement} simultaneously? This question is central to recommender systems: should a recommender system prefer a link suggestion between two online users with similar mindsets in order to keep disagreement low, or between two users with different opinions in order to expose each to the other’s viewpoint of the world, and decrease overall levels of polarization? Our contributions include a mathematical formalization of this question as an optimization problem and an exact, time-efficient algorithm. We also prove that there always exists a network with $O(n/\epsilon^2)$ edges that is a $(1+\epsilon)$ approximation to the optimum. For a fixed graph, we additionally show how to optimize our objective function over the agents’ innate opinions in polynomial time. We perform an empirical study of our proposed methods on synthetic and real-world data that verify their value as mining tools to better understand the trade-off between of disagreement and polarization. We find that there is a lot of space to reduce both polarization and disagreement in real-world networks; for instance, on a Reddit network where users exchange comments on politics, our methods achieve a $\sim 60,000$-fold reduction in polarization and disagreement.
Tasks	Recommendation Systems
Published	2017-12-28
URL	http://arxiv.org/abs/1712.09948v1
PDF	http://arxiv.org/pdf/1712.09948v1.pdf
PWC	https://paperswithcode.com/paper/minimizing-polarization-and-disagreement-in
Repo	https://github.com/tsourolampis/polarization-disagreement
Framework	none

Sequence Modeling via Segmentations


Title	Sequence Modeling via Segmentations
Authors	Chong Wang, Yining Wang, Po-Sen Huang, Abdelrahman Mohamed, Dengyong Zhou, Li Deng
Abstract	Segmental structure is a common pattern in many types of sequences such as phrases in human languages. In this paper, we present a probabilistic model for sequences via their segmentations. The probability of a segmented sequence is calculated as the product of the probabilities of all its segments, where each segment is modeled using existing tools such as recurrent neural networks. Since the segmentation of a sequence is usually unknown in advance, we sum over all valid segmentations to obtain the final probability for the sequence. An efficient dynamic programming algorithm is developed for forward and backward computations without resorting to any approximation. We demonstrate our approach on text segmentation and speech recognition tasks. In addition to quantitative results, we also show that our approach can discover meaningful segments in their respective application contexts.
Tasks	Speech Recognition
Published	2017-02-24
URL	http://arxiv.org/abs/1702.07463v7
PDF	http://arxiv.org/pdf/1702.07463v7.pdf
PWC	https://paperswithcode.com/paper/sequence-modeling-via-segmentations
Repo	https://github.com/Microsoft/NPMT
Framework	torch

Rgtsvm: Support Vector Machines on a GPU in R


Title	Rgtsvm: Support Vector Machines on a GPU in R
Authors	Zhong Wang, Tinyi Chu, Lauren A Choate, Charles G Danko
Abstract	Rgtsvm provides a fast and flexible support vector machine (SVM) implementation for the R language. The distinguishing feature of Rgtsvm is that support vector classification and support vector regression tasks are implemented on a graphical processing unit (GPU), allowing the libraries to scale to millions of examples with >100-fold improvement in performance over existing implementations. Nevertheless, Rgtsvm retains feature parity and has an interface that is compatible with the popular e1071 SVM package in R. Altogether, Rgtsvm enables large SVM models to be created by both experienced and novice practitioners.
Tasks
Published	2017-06-17
URL	http://arxiv.org/abs/1706.05544v1
PDF	http://arxiv.org/pdf/1706.05544v1.pdf
PWC	https://paperswithcode.com/paper/rgtsvm-support-vector-machines-on-a-gpu-in-r
Repo	https://github.com/Danko-Lab/Rgtsvm
Framework	none

Calibrating Energy-based Generative Adversarial Networks


Title	Calibrating Energy-based Generative Adversarial Networks
Authors	Zihang Dai, Amjad Almahairi, Philip Bachman, Eduard Hovy, Aaron Courville
Abstract	In this paper, we propose to equip Generative Adversarial Networks with the ability to produce direct energy estimates for samples.Specifically, we propose a flexible adversarial training framework, and prove this framework not only ensures the generator converges to the true data distribution, but also enables the discriminator to retain the density information at the global optimal. We derive the analytic form of the induced solution, and analyze the properties. In order to make the proposed framework trainable in practice, we introduce two effective approximation techniques. Empirically, the experiment results closely match our theoretical analysis, verifying the discriminator is able to recover the energy of data distribution.
Tasks	Image Generation
Published	2017-02-06
URL	http://arxiv.org/abs/1702.01691v2
PDF	http://arxiv.org/pdf/1702.01691v2.pdf
PWC	https://paperswithcode.com/paper/calibrating-energy-based-generative
Repo	https://github.com/zihangdai/cegan_iclr2017
Framework	none