February 2, 2020

3393 words 16 mins read

Paper Group AWR 59

Using cameras for precise measurement of two-dimensional plant features: CASS. Person Re-identification with Bias-controlled Adversarial Training. Unsupervised Tracklet Person Re-Identification. Non-Rigid Point Set Registration Networks. On Recurrent Neural Networks for Sequence-based Processing in Communications. Metric Attack and Defense for Pers …

Using cameras for precise measurement of two-dimensional plant features: CASS


Title	Using cameras for precise measurement of two-dimensional plant features: CASS
Authors	Amy Tabb, Germán A Holguín, Rachel Naegele
Abstract	Images are used frequently in plant phenotyping to capture measurements. This chapter offers a repeatable method for capturing two-dimensional measurements of plant parts in field or laboratory settings using a variety of camera styles (cellular phone, DSLR), with the addition of a printed calibration pattern. The method is based on calibrating the camera using information available from the EXIF tags from the image, as well as visual information from the pattern. Code is provided to implement the method, as well as a dataset for testing. We include steps to verify protocol correctness by imaging an artifact. The use of this protocol for two-dimensional plant phenotyping will allow data capture from different cameras and environments, with comparison on the same physical scale. We abbreviate this method as CASS, for CAmera aS Scanner. Code and data is available at http://doi.org/10.5281/zenodo.3677473.
Tasks	Calibration
Published	2019-04-30
URL	https://arxiv.org/abs/1904.13187v2
PDF	https://arxiv.org/pdf/1904.13187v2.pdf
PWC	https://paperswithcode.com/paper/using-cameras-for-precise-measurement-of-two
Repo	https://github.com/amy-tabb/aruco-pattern-write
Framework	none

Person Re-identification with Bias-controlled Adversarial Training


Title	Person Re-identification with Bias-controlled Adversarial Training
Authors	Sara Iodice, Krystian Mikolajczyk
Abstract	Inspired by the effectiveness of adversarial training in the area of Generative Adversarial Networks we present a new approach for learning feature representations in person re-identification. We investigate different types of bias that typically occur in re-ID scenarios, i.e., pose, body part and camera view, and propose a general approach to address them. We introduce an adversarial strategy for controlling bias, named Bias-controlled Adversarial framework (BCA), with two complementary branches to reduce or to enhance bias-related features. The results and comparison to the state of the art on different benchmarks show that our framework is an effective strategy for person re-identification. The performance improvements are in both full and partial views of persons.
Tasks	Person Re-Identification
Published	2019-03-30
URL	http://arxiv.org/abs/1904.00244v1
PDF	http://arxiv.org/pdf/1904.00244v1.pdf
PWC	https://paperswithcode.com/paper/person-re-identification-with-bias-controlled
Repo	https://github.com/iodicesara/person-re-identification-with-bias-controlled-adversarial-training
Framework	none

Unsupervised Tracklet Person Re-Identification


Title	Unsupervised Tracklet Person Re-Identification
Authors	Minxian Li, Xiatian Zhu, Shaogang Gong
Abstract	Most existing person re-identification (re-id) methods rely on supervised model learning on per-camera-pair manually labelled pairwise training data. This leads to poor scalability in a practical re-id deployment, due to the lack of exhaustive identity labelling of positive and negative image pairs for every camera-pair. In this work, we present an unsupervised re-id deep learning approach. It is capable of incrementally discovering and exploiting the underlying re-id discriminative information from automatically generated person tracklet data end-to-end. We formulate an Unsupervised Tracklet Association Learning (UTAL) framework. This is by jointly learning within-camera tracklet discrimination and cross-camera tracklet association in order to maximise the discovery of tracklet identity matching both within and across camera views. Extensive experiments demonstrate the superiority of the proposed model over the state-of-the-art unsupervised learning and domain adaptation person re-id methods on eight benchmarking datasets.
Tasks	Domain Adaptation, Person Re-Identification
Published	2019-03-01
URL	http://arxiv.org/abs/1903.00535v1
PDF	http://arxiv.org/pdf/1903.00535v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-tracklet-person-re
Repo	https://github.com/liminxian/DukeMTMC-SI-Tracklet
Framework	none

Non-Rigid Point Set Registration Networks


Title	Non-Rigid Point Set Registration Networks
Authors	Lingjing Wang, Jianchun Chen, Xiang Li, Yi Fang
Abstract	Point set registration is defined as a process to determine the spatial transformation from the source point set to the target one. Existing methods often iteratively search for the optimal geometric transformation to register a given pair of point sets, driven by minimizing a predefined alignment loss function. In contrast, the proposed point registration neural network (PR-Net) actively learns the registration pattern as a parametric function from a training dataset, consequently predict the desired geometric transformation to align a pair of point sets. PR-Net can transfer the learned knowledge (i.e. registration pattern) from registering training pairs to testing ones without additional iterative optimization. Specifically, in this paper, we develop novel techniques to learn shape descriptors from point sets that help formulate a clear correlation between source and target point sets. With the defined correlation, PR-Net tends to predict the transformation so that the source and target point sets can be statistically aligned, which in turn leads to an optimal spatial geometric registration. PR-Net achieves robust and superior performance for non-rigid registration of point sets, even in presence of Gaussian noise, outliers, and missing points, but requires much less time for registering large number of pairs. More importantly, for a new pair of point sets, PR-Net is able to directly predict the desired transformation using the learned model without repetitive iterative optimization routine. Our code is available at https://github.com/Lingjing324/PR-Net.
Tasks
Published	2019-04-02
URL	http://arxiv.org/abs/1904.01428v1
PDF	http://arxiv.org/pdf/1904.01428v1.pdf
PWC	https://paperswithcode.com/paper/non-rigid-point-set-registration-networks
Repo	https://github.com/Lingjing324/PR-Net
Framework	pytorch

On Recurrent Neural Networks for Sequence-based Processing in Communications


Title	On Recurrent Neural Networks for Sequence-based Processing in Communications
Authors	Daniel Tandler, Sebastian Dörner, Sebastian Cammerer, Stephan ten Brink
Abstract	In this work, we analyze the capabilities and practical limitations of neural networks (NNs) for sequence-based signal processing which can be seen as an omnipresent property in almost any modern communication systems. In particular, we train multiple state-of-the-art recurrent neural network (RNN) structures to learn how to decode convolutional codes allowing a clear benchmarking with the corresponding maximum likelihood (ML) Viterbi decoder. We examine the decoding performance for various kinds of NN architectures, beginning with classical types like feedforward layers and gated recurrent unit (GRU)-layers, up to more recently introduced architectures such as temporal convolutional networks (TCNs) and differentiable neural computers (DNCs) with external memory. As a key limitation, it turns out that the training complexity increases exponentially with the length of the encoding memory $\nu$ and, thus, practically limits the achievable bit error rate (BER) performance. To overcome this limitation, we introduce a new training-method by gradually increasing the number of ones within the training sequences, i.e., we constrain the amount of possible training sequences in the beginning until first convergence. By consecutively adding more and more possible sequences to the training set, we finally achieve training success in cases that did not converge before via naive training. Further, we show that our network can learn to jointly detect and decode a quadrature phase shift keying (QPSK) modulated code with sub-optimal (anti-Gray) labeling in one-shot at a performance that would require iterations between demapper and decoder in classic detection schemes.
Tasks
Published	2019-05-24
URL	https://arxiv.org/abs/1905.09983v3
PDF	https://arxiv.org/pdf/1905.09983v3.pdf
PWC	https://paperswithcode.com/paper/on-recurrent-neural-networks-for-sequence
Repo	https://github.com/sdnr/RNN-Conv-Decoder
Framework	none

Metric Attack and Defense for Person Re-identification


Title	Metric Attack and Defense for Person Re-identification
Authors	Song Bai, Yingwei Li, Yuyin Zhou, Qizhu Li, Philip H. S. Torr
Abstract	Person re-identification (re-ID) has attracted much attention recently due to its great importance in video surveillance. In general, distance metrics used to identify two person images are expected to be robust under various appearance changes. However, our work observes the extreme vulnerability of existing distance metrics to adversarial examples, generated by simply adding human-imperceptible perturbations to person images. Hence, the security danger is dramatically increased when deploying commercial re-ID systems in video surveillance. Although adversarial examples have been extensively applied for classification analysis, it is rarely studied in metric analysis like person re-identification. The most likely reason is the natural gap between the training and testing of re-ID networks, that is, the predictions of a re-ID network cannot be directly used during testing without an effective metric. In this work, we bridge the gap by proposing Adversarial Metric Attack, a parallel methodology to adversarial classification attacks. Comprehensive experiments clearly reveal the adversarial effects in re-ID systems. Meanwhile, we also present an early attempt of training a metric-preserving network, thereby defending the metric against adversarial attacks. At last, by benchmarking various adversarial settings, we expect that our work can facilitate the development of adversarial attack and defense in metric-based applications.
Tasks	Adversarial Attack, Person Re-Identification
Published	2019-01-30
URL	http://arxiv.org/abs/1901.10650v2
PDF	http://arxiv.org/pdf/1901.10650v2.pdf
PWC	https://paperswithcode.com/paper/adversarial-metric-attack-for-person-re
Repo	https://github.com/SongBaiHust/Adversarial_Metric_Attack
Framework	pytorch

Unsupervised Person Re-identification by Deep Asymmetric Metric Embedding


Title	Unsupervised Person Re-identification by Deep Asymmetric Metric Embedding
Authors	Hong-Xing Yu, Ancong Wu, Wei-Shi Zheng
Abstract	Person re-identification (Re-ID) aims to match identities across non-overlapping camera views. Researchers have proposed many supervised Re-ID models which require quantities of cross-view pairwise labelled data. This limits their scalabilities to many applications where a large amount of data from multiple disjoint camera views is available but unlabelled. Although some unsupervised Re-ID models have been proposed to address the scalability problem, they often suffer from the view-specific bias problem which is caused by dramatic variances across different camera views, e.g., different illumination, viewpoints and occlusion. The dramatic variances induce specific feature distortions in different camera views, which can be very disturbing in finding cross-view discriminative information for Re-ID in the unsupervised scenarios, since no label information is available to help alleviate the bias. We propose to explicitly address this problem by learning an unsupervised asymmetric distance metric based on cross-view clustering. The asymmetric distance metric allows specific feature transformations for each camera view to tackle the specific feature distortions. We then design a novel unsupervised loss function to embed the asymmetric metric into a deep neural network, and therefore develop a novel unsupervised deep framework named the DEep Clustering-based Asymmetric MEtric Learning (DECAMEL). In such a way, DECAMEL jointly learns the feature representation and the unsupervised asymmetric metric. DECAMEL learns a compact cross-view cluster structure of Re-ID data, and thus help alleviate the view-specific bias and facilitate mining the potential cross-view discriminative information for unsupervised Re-ID. Extensive experiments on seven benchmark datasets whose sizes span several orders show the effectiveness of our framework.
Tasks	Metric Learning, Person Re-Identification, Unsupervised Person Re-Identification
Published	2019-01-29
URL	http://arxiv.org/abs/1901.10177v1
PDF	http://arxiv.org/pdf/1901.10177v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-person-re-identification-by-deep
Repo	https://github.com/KovenYu/DECAMEL
Framework	none

Three-dimensional virtual refocusing of fluorescence microscopy images using deep learning


Title	Three-dimensional virtual refocusing of fluorescence microscopy images using deep learning
Authors	Yichen Wu, Yair Rivenson, Hongda Wang, Yilin Luo, Eyal Ben-David, Laurent A. Bentolila, Christian Pritz, Aydogan Ozcan
Abstract	Three-dimensional (3D) fluorescence microscopy in general requires axial scanning to capture images of a sample at different planes. Here we demonstrate that a deep convolutional neural network can be trained to virtually refocus a 2D fluorescence image onto user-defined 3D surfaces within the sample volume. With this data-driven computational microscopy framework, we imaged the neuron activity of a Caenorhabditis elegans worm in 3D using a time-sequence of fluorescence images acquired at a single focal plane, digitally increasing the depth-of-field of the microscope by 20-fold without any axial scanning, additional hardware, or a trade-off of imaging resolution or speed. Furthermore, we demonstrate that this learning-based approach can correct for sample drift, tilt, and other image aberrations, all digitally performed after the acquisition of a single fluorescence image. This unique framework also cross-connects different imaging modalities to each other, enabling 3D refocusing of a single wide-field fluorescence image to match confocal microscopy images acquired at different sample planes. This deep learning-based 3D image refocusing method might be transformative for imaging and tracking of 3D biological samples, especially over extended periods of time, mitigating photo-toxicity, sample drift, aberration and defocusing related challenges associated with standard 3D fluorescence microscopy techniques.
Tasks
Published	2019-01-31
URL	https://arxiv.org/abs/1901.11252v2
PDF	https://arxiv.org/pdf/1901.11252v2.pdf
PWC	https://paperswithcode.com/paper/three-dimensional-propagation-and-time
Repo	https://github.com/puppy101puppy/Deep-Z
Framework	none

Subword Language Model for Query Auto-Completion


Title	Subword Language Model for Query Auto-Completion
Authors	Gyuwan Kim
Abstract	Current neural query auto-completion (QAC) systems rely on character-level language models, but they slow down when queries are long. We present how to utilize subword language models for the fast and accurate generation of query completion candidates. Representing queries with subwords shorten a decoding length significantly. To deal with issues coming from introducing subword language model, we develop a retrace algorithm and a reranking method by approximate marginalization. As a result, our model achieves up to 2.5 times faster while maintaining a similar quality of generated results compared to the character-level baseline. Also, we propose a new evaluation metric, mean recoverable length (MRL), measuring how many upcoming characters the model could complete correctly. It provides more explicit meaning and eliminates the need for prefix length sampling for existing rank-based metrics. Moreover, we performed a comprehensive analysis with ablation study to figure out the importance of each component.
Tasks	Language Modelling
Published	2019-09-02
URL	https://arxiv.org/abs/1909.00599v1
PDF	https://arxiv.org/pdf/1909.00599v1.pdf
PWC	https://paperswithcode.com/paper/subword-language-model-for-query-auto
Repo	https://github.com/clovaai/subword-qac
Framework	pytorch

Discovering Underlying Person Structure Pattern with Relative Local Distance for Person Re-identification


Title	Discovering Underlying Person Structure Pattern with Relative Local Distance for Person Re-identification
Authors	Guangcong Wang, Jianhuang Lai, Zhenyu Xie, Xiaohua Xie
Abstract	Modeling the underlying person structure for person re-identification (re-ID) is difficult due to diverse deformable poses, changeable camera views and imperfect person detectors. How to exploit underlying person structure information without extra annotations to improve the performance of person re-ID remains largely unexplored. To address this problem, we propose a novel Relative Local Distance (RLD) method that integrates a relative local distance constraint into convolutional neural networks (CNNs) in an end-to-end way. It is the first time that the relative local constraint is proposed to guide the global feature representation learning. Specially, a relative local distance matrix is computed by using feature maps and then regarded as a regularizer to guide CNNs to learn a structure-aware feature representation. With the discovered underlying person structure, the RLD method builds a bridge between the global and local feature representation and thus improves the capacity of feature representation for person re-ID. Furthermore, RLD also significantly accelerates deep network training compared with conventional methods. The experimental results show the effectiveness of RLD on the CUHK03, Market-1501, and DukeMTMC-reID datasets. Code is available at \url{https://github.com/Wanggcong/RLD_codes}.
Tasks	Person Re-Identification, Representation Learning
Published	2019-01-29
URL	http://arxiv.org/abs/1901.10100v1
PDF	http://arxiv.org/pdf/1901.10100v1.pdf
PWC	https://paperswithcode.com/paper/discovering-underlying-person-structure
Repo	https://github.com/Wanggcong/RLD_codes
Framework	pytorch

Hybrid Task Cascade for Instance Segmentation


Title	Hybrid Task Cascade for Instance Segmentation
Authors	Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin
Abstract	Cascade is a classic yet powerful architecture that has boosted performance on various tasks. However, how to introduce cascade to instance segmentation remains an open question. A simple combination of Cascade R-CNN and Mask R-CNN only brings limited gain. In exploring a more effective approach, we find that the key to a successful instance segmentation cascade is to fully leverage the reciprocal relationship between detection and segmentation. In this work, we propose a new framework, Hybrid Task Cascade (HTC), which differs in two important aspects: (1) instead of performing cascaded refinement on these two tasks separately, it interweaves them for a joint multi-stage processing; (2) it adopts a fully convolutional branch to provide spatial context, which can help distinguishing hard foreground from cluttered background. Overall, this framework can learn more discriminative features progressively while integrating complementary features together in each stage. Without bells and whistles, a single HTC obtains 38.4 and 1.5 improvement over a strong Cascade Mask R-CNN baseline on MSCOCO dataset. Moreover, our overall system achieves 48.6 mask AP on the test-challenge split, ranking 1st in the COCO 2018 Challenge Object Detection Task. Code is available at: https://github.com/open-mmlab/mmdetection.
Tasks	Instance Segmentation, Object Detection, Semantic Segmentation
Published	2019-01-22
URL	http://arxiv.org/abs/1901.07518v2
PDF	http://arxiv.org/pdf/1901.07518v2.pdf
PWC	https://paperswithcode.com/paper/hybrid-task-cascade-for-instance-segmentation
Repo	https://github.com/amirassov/kaggle-imaterialist
Framework	pytorch

Backbone Can Not be Trained at Once: Rolling Back to Pre-trained Network for Person Re-Identification


Title	Backbone Can Not be Trained at Once: Rolling Back to Pre-trained Network for Person Re-Identification
Authors	Youngmin Ro, Jongwon Choi, Dae Ung Jo, Byeongho Heo, Jongin Lim, Jin Young Choi
Abstract	In person re-identification (ReID) task, because of its shortage of trainable dataset, it is common to utilize fine-tuning method using a classification network pre-trained on a large dataset. However, it is relatively difficult to sufficiently fine-tune the low-level layers of the network due to the gradient vanishing problem. In this work, we propose a novel fine-tuning strategy that allows low-level layers to be sufficiently trained by rolling back the weights of high-level layers to their initial pre-trained weights. Our strategy alleviates the problem of gradient vanishing in low-level layers and robustly trains the low-level layers to fit the ReID dataset, thereby increasing the performance of ReID tasks. The improved performance of the proposed strategy is validated via several experiments. Furthermore, without any add-ons such as pose estimation or segmentation, our strategy exhibits state-of-the-art performance using only vanilla deep convolutional neural network architecture.
Tasks	Person Re-Identification, Pose Estimation
Published	2019-01-18
URL	http://arxiv.org/abs/1901.06140v1
PDF	http://arxiv.org/pdf/1901.06140v1.pdf
PWC	https://paperswithcode.com/paper/backbone-can-not-be-trained-at-once-rolling
Repo	https://github.com/youngminPIL/rollback
Framework	pytorch

UrbanFM: Inferring Fine-Grained Urban Flows


Title	UrbanFM: Inferring Fine-Grained Urban Flows
Authors	Yuxuan Liang, Kun Ouyang, Lin Jing, Sijie Ruan, Ye Liu, Junbo Zhang, David S. Rosenblum, Yu Zheng
Abstract	Urban flow monitoring systems play important roles in smart city efforts around the world. However, the ubiquitous deployment of monitoring devices, such as CCTVs, induces a long-lasting and enormous cost for maintenance and operation. This suggests the need for a technology that can reduce the number of deployed devices, while preventing the degeneration of data accuracy and granularity. In this paper, we aim to infer the real-time and fine-grained crowd flows throughout a city based on coarse-grained observations. This task is challenging due to two reasons: the spatial correlations between coarse- and fine-grained urban flows, and the complexities of external impacts. To tackle these issues, we develop a method entitled UrbanFM based on deep neural networks. Our model consists of two major parts: 1) an inference network to generate fine-grained flow distributions from coarse-grained inputs by using a feature extraction module and a novel distributional upsampling module; 2) a general fusion subnet to further boost the performance by considering the influences of different external factors. Extensive experiments on two real-world datasets, namely TaxiBJ and HappyValley, validate the effectiveness and efficiency of our method compared to seven baselines, demonstrating the state-of-the-art performance of our approach on the fine-grained urban flow inference problem.
Tasks
Published	2019-02-06
URL	http://arxiv.org/abs/1902.05377v1
PDF	http://arxiv.org/pdf/1902.05377v1.pdf
PWC	https://paperswithcode.com/paper/urbanfm-inferring-fine-grained-urban-flows
Repo	https://github.com/yoshall/UrbanFM
Framework	pytorch

Still a Pain in the Neck: Evaluating Text Representations on Lexical Composition


Title	Still a Pain in the Neck: Evaluating Text Representations on Lexical Composition
Authors	Vered Shwartz, Ido Dagan
Abstract	Building meaningful phrase representations is challenging because phrase meanings are not simply the sum of their constituent meanings. Lexical composition can shift the meanings of the constituent words and introduce implicit information. We tested a broad range of textual representations for their capacity to address these issues. We found that as expected, contextualized word representations perform better than static word embeddings, more so on detecting meaning shift than in recovering implicit information, in which their performance is still far from that of humans. Our evaluation suite, including 5 tasks related to lexical composition effects, can serve future research aiming to improve such representations.
Tasks	Word Embeddings
Published	2019-02-27
URL	https://arxiv.org/abs/1902.10618v2
PDF	https://arxiv.org/pdf/1902.10618v2.pdf
PWC	https://paperswithcode.com/paper/still-a-pain-in-the-neck-evaluating-text
Repo	https://github.com/vered1986/lexcomp
Framework	none

Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation


Title	Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation
Authors	Zhi Tian, Tong He, Chunhua Shen, Youliang Yan
Abstract	Recent semantic segmentation methods exploit encoder-decoder architectures to produce the desired pixel-wise segmentation prediction. The last layer of the decoders is typically a bilinear upsampling procedure to recover the final pixel-wise prediction. We empirically show that this oversimple and data-independent bilinear upsampling may lead to sub-optimal results. In this work, we propose a data-dependent upsampling (DUpsampling) to replace bilinear, which takes advantages of the redundancy in the label space of semantic segmentation and is able to recover the pixel-wise prediction from low-resolution outputs of CNNs. The main advantage of the new upsampling layer lies in that with a relatively lower-resolution feature map such as $\frac{1}{16}$ or $\frac{1}{32}$ of the input size, we can achieve even better segmentation accuracy, significantly reducing computation complexity. This is made possible by 1) the new upsampling layer’s much improved reconstruction capability; and more importantly 2) the DUpsampling based decoder’s flexibility in leveraging almost arbitrary combinations of the CNN encoders’ features. Experiments demonstrate that our proposed decoder outperforms the state-of-the-art decoder, with only $\sim$20% of computation. Finally, without any post-processing, the framework equipped with our proposed decoder achieves new state-of-the-art performance on two datasets: 88.1% mIOU on PASCAL VOC with 30% computation of the previously best model; and 52.5% mIOU on PASCAL Context.
Tasks	Semantic Segmentation
Published	2019-03-05
URL	http://arxiv.org/abs/1903.02120v3
PDF	http://arxiv.org/pdf/1903.02120v3.pdf
PWC	https://paperswithcode.com/paper/decoders-matter-for-semantic-segmentation
Repo	https://github.com/LinZhuoChen/DUpsampling
Framework	pytorch