Paper Group AWR 59
Using cameras for precise measurement of two-dimensional plant features: CASS. Person Re-identification with Bias-controlled Adversarial Training. Unsupervised Tracklet Person Re-Identification. Non-Rigid Point Set Registration Networks. On Recurrent Neural Networks for Sequence-based Processing in Communications. Metric Attack and Defense for Pers …
Using cameras for precise measurement of two-dimensional plant features: CASS
Title | Using cameras for precise measurement of two-dimensional plant features: CASS |
Authors | Amy Tabb, Germán A Holguín, Rachel Naegele |
Abstract | Images are used frequently in plant phenotyping to capture measurements. This chapter offers a repeatable method for capturing two-dimensional measurements of plant parts in field or laboratory settings using a variety of camera styles (cellular phone, DSLR), with the addition of a printed calibration pattern. The method is based on calibrating the camera using information available from the EXIF tags from the image, as well as visual information from the pattern. Code is provided to implement the method, as well as a dataset for testing. We include steps to verify protocol correctness by imaging an artifact. The use of this protocol for two-dimensional plant phenotyping will allow data capture from different cameras and environments, with comparison on the same physical scale. We abbreviate this method as CASS, for CAmera aS Scanner. Code and data is available at http://doi.org/10.5281/zenodo.3677473. |
Tasks | Calibration |
Published | 2019-04-30 |
URL | https://arxiv.org/abs/1904.13187v2 |
https://arxiv.org/pdf/1904.13187v2.pdf | |
PWC | https://paperswithcode.com/paper/using-cameras-for-precise-measurement-of-two |
Repo | https://github.com/amy-tabb/aruco-pattern-write |
Framework | none |
Person Re-identification with Bias-controlled Adversarial Training
Title | Person Re-identification with Bias-controlled Adversarial Training |
Authors | Sara Iodice, Krystian Mikolajczyk |
Abstract | Inspired by the effectiveness of adversarial training in the area of Generative Adversarial Networks we present a new approach for learning feature representations in person re-identification. We investigate different types of bias that typically occur in re-ID scenarios, i.e., pose, body part and camera view, and propose a general approach to address them. We introduce an adversarial strategy for controlling bias, named Bias-controlled Adversarial framework (BCA), with two complementary branches to reduce or to enhance bias-related features. The results and comparison to the state of the art on different benchmarks show that our framework is an effective strategy for person re-identification. The performance improvements are in both full and partial views of persons. |
Tasks | Person Re-Identification |
Published | 2019-03-30 |
URL | http://arxiv.org/abs/1904.00244v1 |
http://arxiv.org/pdf/1904.00244v1.pdf | |
PWC | https://paperswithcode.com/paper/person-re-identification-with-bias-controlled |
Repo | https://github.com/iodicesara/person-re-identification-with-bias-controlled-adversarial-training |
Framework | none |
Unsupervised Tracklet Person Re-Identification
Title | Unsupervised Tracklet Person Re-Identification |
Authors | Minxian Li, Xiatian Zhu, Shaogang Gong |
Abstract | Most existing person re-identification (re-id) methods rely on supervised model learning on per-camera-pair manually labelled pairwise training data. This leads to poor scalability in a practical re-id deployment, due to the lack of exhaustive identity labelling of positive and negative image pairs for every camera-pair. In this work, we present an unsupervised re-id deep learning approach. It is capable of incrementally discovering and exploiting the underlying re-id discriminative information from automatically generated person tracklet data end-to-end. We formulate an Unsupervised Tracklet Association Learning (UTAL) framework. This is by jointly learning within-camera tracklet discrimination and cross-camera tracklet association in order to maximise the discovery of tracklet identity matching both within and across camera views. Extensive experiments demonstrate the superiority of the proposed model over the state-of-the-art unsupervised learning and domain adaptation person re-id methods on eight benchmarking datasets. |
Tasks | Domain Adaptation, Person Re-Identification |
Published | 2019-03-01 |
URL | http://arxiv.org/abs/1903.00535v1 |
http://arxiv.org/pdf/1903.00535v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-tracklet-person-re |
Repo | https://github.com/liminxian/DukeMTMC-SI-Tracklet |
Framework | none |
Non-Rigid Point Set Registration Networks
Title | Non-Rigid Point Set Registration Networks |
Authors | Lingjing Wang, Jianchun Chen, Xiang Li, Yi Fang |
Abstract | Point set registration is defined as a process to determine the spatial transformation from the source point set to the target one. Existing methods often iteratively search for the optimal geometric transformation to register a given pair of point sets, driven by minimizing a predefined alignment loss function. In contrast, the proposed point registration neural network (PR-Net) actively learns the registration pattern as a parametric function from a training dataset, consequently predict the desired geometric transformation to align a pair of point sets. PR-Net can transfer the learned knowledge (i.e. registration pattern) from registering training pairs to testing ones without additional iterative optimization. Specifically, in this paper, we develop novel techniques to learn shape descriptors from point sets that help formulate a clear correlation between source and target point sets. With the defined correlation, PR-Net tends to predict the transformation so that the source and target point sets can be statistically aligned, which in turn leads to an optimal spatial geometric registration. PR-Net achieves robust and superior performance for non-rigid registration of point sets, even in presence of Gaussian noise, outliers, and missing points, but requires much less time for registering large number of pairs. More importantly, for a new pair of point sets, PR-Net is able to directly predict the desired transformation using the learned model without repetitive iterative optimization routine. Our code is available at https://github.com/Lingjing324/PR-Net. |
Tasks | |
Published | 2019-04-02 |
URL | http://arxiv.org/abs/1904.01428v1 |
http://arxiv.org/pdf/1904.01428v1.pdf | |
PWC | https://paperswithcode.com/paper/non-rigid-point-set-registration-networks |
Repo | https://github.com/Lingjing324/PR-Net |
Framework | pytorch |
On Recurrent Neural Networks for Sequence-based Processing in Communications
Title | On Recurrent Neural Networks for Sequence-based Processing in Communications |
Authors | Daniel Tandler, Sebastian Dörner, Sebastian Cammerer, Stephan ten Brink |
Abstract | In this work, we analyze the capabilities and practical limitations of neural networks (NNs) for sequence-based signal processing which can be seen as an omnipresent property in almost any modern communication systems. In particular, we train multiple state-of-the-art recurrent neural network (RNN) structures to learn how to decode convolutional codes allowing a clear benchmarking with the corresponding maximum likelihood (ML) Viterbi decoder. We examine the decoding performance for various kinds of NN architectures, beginning with classical types like feedforward layers and gated recurrent unit (GRU)-layers, up to more recently introduced architectures such as temporal convolutional networks (TCNs) and differentiable neural computers (DNCs) with external memory. As a key limitation, it turns out that the training complexity increases exponentially with the length of the encoding memory $\nu$ and, thus, practically limits the achievable bit error rate (BER) performance. To overcome this limitation, we introduce a new training-method by gradually increasing the number of ones within the training sequences, i.e., we constrain the amount of possible training sequences in the beginning until first convergence. By consecutively adding more and more possible sequences to the training set, we finally achieve training success in cases that did not converge before via naive training. Further, we show that our network can learn to jointly detect and decode a quadrature phase shift keying (QPSK) modulated code with sub-optimal (anti-Gray) labeling in one-shot at a performance that would require iterations between demapper and decoder in classic detection schemes. |
Tasks | |
Published | 2019-05-24 |
URL | https://arxiv.org/abs/1905.09983v3 |
https://arxiv.org/pdf/1905.09983v3.pdf | |
PWC | https://paperswithcode.com/paper/on-recurrent-neural-networks-for-sequence |
Repo | https://github.com/sdnr/RNN-Conv-Decoder |
Framework | none |
Metric Attack and Defense for Person Re-identification
Title | Metric Attack and Defense for Person Re-identification |
Authors | Song Bai, Yingwei Li, Yuyin Zhou, Qizhu Li, Philip H. S. Torr |
Abstract | Person re-identification (re-ID) has attracted much attention recently due to its great importance in video surveillance. In general, distance metrics used to identify two person images are expected to be robust under various appearance changes. However, our work observes the extreme vulnerability of existing distance metrics to adversarial examples, generated by simply adding human-imperceptible perturbations to person images. Hence, the security danger is dramatically increased when deploying commercial re-ID systems in video surveillance. Although adversarial examples have been extensively applied for classification analysis, it is rarely studied in metric analysis like person re-identification. The most likely reason is the natural gap between the training and testing of re-ID networks, that is, the predictions of a re-ID network cannot be directly used during testing without an effective metric. In this work, we bridge the gap by proposing Adversarial Metric Attack, a parallel methodology to adversarial classification attacks. Comprehensive experiments clearly reveal the adversarial effects in re-ID systems. Meanwhile, we also present an early attempt of training a metric-preserving network, thereby defending the metric against adversarial attacks. At last, by benchmarking various adversarial settings, we expect that our work can facilitate the development of adversarial attack and defense in metric-based applications. |
Tasks | Adversarial Attack, Person Re-Identification |
Published | 2019-01-30 |
URL | http://arxiv.org/abs/1901.10650v2 |
http://arxiv.org/pdf/1901.10650v2.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-metric-attack-for-person-re |
Repo | https://github.com/SongBaiHust/Adversarial_Metric_Attack |
Framework | pytorch |
Unsupervised Person Re-identification by Deep Asymmetric Metric Embedding
Title | Unsupervised Person Re-identification by Deep Asymmetric Metric Embedding |
Authors | Hong-Xing Yu, Ancong Wu, Wei-Shi Zheng |
Abstract | Person re-identification (Re-ID) aims to match identities across non-overlapping camera views. Researchers have proposed many supervised Re-ID models which require quantities of cross-view pairwise labelled data. This limits their scalabilities to many applications where a large amount of data from multiple disjoint camera views is available but unlabelled. Although some unsupervised Re-ID models have been proposed to address the scalability problem, they often suffer from the view-specific bias problem which is caused by dramatic variances across different camera views, e.g., different illumination, viewpoints and occlusion. The dramatic variances induce specific feature distortions in different camera views, which can be very disturbing in finding cross-view discriminative information for Re-ID in the unsupervised scenarios, since no label information is available to help alleviate the bias. We propose to explicitly address this problem by learning an unsupervised asymmetric distance metric based on cross-view clustering. The asymmetric distance metric allows specific feature transformations for each camera view to tackle the specific feature distortions. We then design a novel unsupervised loss function to embed the asymmetric metric into a deep neural network, and therefore develop a novel unsupervised deep framework named the DEep Clustering-based Asymmetric MEtric Learning (DECAMEL). In such a way, DECAMEL jointly learns the feature representation and the unsupervised asymmetric metric. DECAMEL learns a compact cross-view cluster structure of Re-ID data, and thus help alleviate the view-specific bias and facilitate mining the potential cross-view discriminative information for unsupervised Re-ID. Extensive experiments on seven benchmark datasets whose sizes span several orders show the effectiveness of our framework. |
Tasks | Metric Learning, Person Re-Identification, Unsupervised Person Re-Identification |
Published | 2019-01-29 |
URL | http://arxiv.org/abs/1901.10177v1 |
http://arxiv.org/pdf/1901.10177v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-person-re-identification-by-deep |
Repo | https://github.com/KovenYu/DECAMEL |
Framework | none |
Three-dimensional virtual refocusing of fluorescence microscopy images using deep learning
Title | Three-dimensional virtual refocusing of fluorescence microscopy images using deep learning |
Authors | Yichen Wu, Yair Rivenson, Hongda Wang, Yilin Luo, Eyal Ben-David, Laurent A. Bentolila, Christian Pritz, Aydogan Ozcan |
Abstract | Three-dimensional (3D) fluorescence microscopy in general requires axial scanning to capture images of a sample at different planes. Here we demonstrate that a deep convolutional neural network can be trained to virtually refocus a 2D fluorescence image onto user-defined 3D surfaces within the sample volume. With this data-driven computational microscopy framework, we imaged the neuron activity of a Caenorhabditis elegans worm in 3D using a time-sequence of fluorescence images acquired at a single focal plane, digitally increasing the depth-of-field of the microscope by 20-fold without any axial scanning, additional hardware, or a trade-off of imaging resolution or speed. Furthermore, we demonstrate that this learning-based approach can correct for sample drift, tilt, and other image aberrations, all digitally performed after the acquisition of a single fluorescence image. This unique framework also cross-connects different imaging modalities to each other, enabling 3D refocusing of a single wide-field fluorescence image to match confocal microscopy images acquired at different sample planes. This deep learning-based 3D image refocusing method might be transformative for imaging and tracking of 3D biological samples, especially over extended periods of time, mitigating photo-toxicity, sample drift, aberration and defocusing related challenges associated with standard 3D fluorescence microscopy techniques. |
Tasks | |
Published | 2019-01-31 |
URL | https://arxiv.org/abs/1901.11252v2 |
https://arxiv.org/pdf/1901.11252v2.pdf | |
PWC | https://paperswithcode.com/paper/three-dimensional-propagation-and-time |
Repo | https://github.com/puppy101puppy/Deep-Z |
Framework | none |
Subword Language Model for Query Auto-Completion
Title | Subword Language Model for Query Auto-Completion |
Authors | Gyuwan Kim |
Abstract | Current neural query auto-completion (QAC) systems rely on character-level language models, but they slow down when queries are long. We present how to utilize subword language models for the fast and accurate generation of query completion candidates. Representing queries with subwords shorten a decoding length significantly. To deal with issues coming from introducing subword language model, we develop a retrace algorithm and a reranking method by approximate marginalization. As a result, our model achieves up to 2.5 times faster while maintaining a similar quality of generated results compared to the character-level baseline. Also, we propose a new evaluation metric, mean recoverable length (MRL), measuring how many upcoming characters the model could complete correctly. It provides more explicit meaning and eliminates the need for prefix length sampling for existing rank-based metrics. Moreover, we performed a comprehensive analysis with ablation study to figure out the importance of each component. |
Tasks | Language Modelling |
Published | 2019-09-02 |
URL | https://arxiv.org/abs/1909.00599v1 |
https://arxiv.org/pdf/1909.00599v1.pdf | |
PWC | https://paperswithcode.com/paper/subword-language-model-for-query-auto |
Repo | https://github.com/clovaai/subword-qac |
Framework | pytorch |
Discovering Underlying Person Structure Pattern with Relative Local Distance for Person Re-identification
Title | Discovering Underlying Person Structure Pattern with Relative Local Distance for Person Re-identification |
Authors | Guangcong Wang, Jianhuang Lai, Zhenyu Xie, Xiaohua Xie |
Abstract | Modeling the underlying person structure for person re-identification (re-ID) is difficult due to diverse deformable poses, changeable camera views and imperfect person detectors. How to exploit underlying person structure information without extra annotations to improve the performance of person re-ID remains largely unexplored. To address this problem, we propose a novel Relative Local Distance (RLD) method that integrates a relative local distance constraint into convolutional neural networks (CNNs) in an end-to-end way. It is the first time that the relative local constraint is proposed to guide the global feature representation learning. Specially, a relative local distance matrix is computed by using feature maps and then regarded as a regularizer to guide CNNs to learn a structure-aware feature representation. With the discovered underlying person structure, the RLD method builds a bridge between the global and local feature representation and thus improves the capacity of feature representation for person re-ID. Furthermore, RLD also significantly accelerates deep network training compared with conventional methods. The experimental results show the effectiveness of RLD on the CUHK03, Market-1501, and DukeMTMC-reID datasets. Code is available at \url{https://github.com/Wanggcong/RLD_codes}. |
Tasks | Person Re-Identification, Representation Learning |
Published | 2019-01-29 |
URL | http://arxiv.org/abs/1901.10100v1 |
http://arxiv.org/pdf/1901.10100v1.pdf | |
PWC | https://paperswithcode.com/paper/discovering-underlying-person-structure |
Repo | https://github.com/Wanggcong/RLD_codes |
Framework | pytorch |
Hybrid Task Cascade for Instance Segmentation
Title | Hybrid Task Cascade for Instance Segmentation |
Authors | Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin |
Abstract | Cascade is a classic yet powerful architecture that has boosted performance on various tasks. However, how to introduce cascade to instance segmentation remains an open question. A simple combination of Cascade R-CNN and Mask R-CNN only brings limited gain. In exploring a more effective approach, we find that the key to a successful instance segmentation cascade is to fully leverage the reciprocal relationship between detection and segmentation. In this work, we propose a new framework, Hybrid Task Cascade (HTC), which differs in two important aspects: (1) instead of performing cascaded refinement on these two tasks separately, it interweaves them for a joint multi-stage processing; (2) it adopts a fully convolutional branch to provide spatial context, which can help distinguishing hard foreground from cluttered background. Overall, this framework can learn more discriminative features progressively while integrating complementary features together in each stage. Without bells and whistles, a single HTC obtains 38.4 and 1.5 improvement over a strong Cascade Mask R-CNN baseline on MSCOCO dataset. Moreover, our overall system achieves 48.6 mask AP on the test-challenge split, ranking 1st in the COCO 2018 Challenge Object Detection Task. Code is available at: https://github.com/open-mmlab/mmdetection. |
Tasks | Instance Segmentation, Object Detection, Semantic Segmentation |
Published | 2019-01-22 |
URL | http://arxiv.org/abs/1901.07518v2 |
http://arxiv.org/pdf/1901.07518v2.pdf | |
PWC | https://paperswithcode.com/paper/hybrid-task-cascade-for-instance-segmentation |
Repo | https://github.com/amirassov/kaggle-imaterialist |
Framework | pytorch |
Backbone Can Not be Trained at Once: Rolling Back to Pre-trained Network for Person Re-Identification
Title | Backbone Can Not be Trained at Once: Rolling Back to Pre-trained Network for Person Re-Identification |
Authors | Youngmin Ro, Jongwon Choi, Dae Ung Jo, Byeongho Heo, Jongin Lim, Jin Young Choi |
Abstract | In person re-identification (ReID) task, because of its shortage of trainable dataset, it is common to utilize fine-tuning method using a classification network pre-trained on a large dataset. However, it is relatively difficult to sufficiently fine-tune the low-level layers of the network due to the gradient vanishing problem. In this work, we propose a novel fine-tuning strategy that allows low-level layers to be sufficiently trained by rolling back the weights of high-level layers to their initial pre-trained weights. Our strategy alleviates the problem of gradient vanishing in low-level layers and robustly trains the low-level layers to fit the ReID dataset, thereby increasing the performance of ReID tasks. The improved performance of the proposed strategy is validated via several experiments. Furthermore, without any add-ons such as pose estimation or segmentation, our strategy exhibits state-of-the-art performance using only vanilla deep convolutional neural network architecture. |
Tasks | Person Re-Identification, Pose Estimation |
Published | 2019-01-18 |
URL | http://arxiv.org/abs/1901.06140v1 |
http://arxiv.org/pdf/1901.06140v1.pdf | |
PWC | https://paperswithcode.com/paper/backbone-can-not-be-trained-at-once-rolling |
Repo | https://github.com/youngminPIL/rollback |
Framework | pytorch |
UrbanFM: Inferring Fine-Grained Urban Flows
Title | UrbanFM: Inferring Fine-Grained Urban Flows |
Authors | Yuxuan Liang, Kun Ouyang, Lin Jing, Sijie Ruan, Ye Liu, Junbo Zhang, David S. Rosenblum, Yu Zheng |
Abstract | Urban flow monitoring systems play important roles in smart city efforts around the world. However, the ubiquitous deployment of monitoring devices, such as CCTVs, induces a long-lasting and enormous cost for maintenance and operation. This suggests the need for a technology that can reduce the number of deployed devices, while preventing the degeneration of data accuracy and granularity. In this paper, we aim to infer the real-time and fine-grained crowd flows throughout a city based on coarse-grained observations. This task is challenging due to two reasons: the spatial correlations between coarse- and fine-grained urban flows, and the complexities of external impacts. To tackle these issues, we develop a method entitled UrbanFM based on deep neural networks. Our model consists of two major parts: 1) an inference network to generate fine-grained flow distributions from coarse-grained inputs by using a feature extraction module and a novel distributional upsampling module; 2) a general fusion subnet to further boost the performance by considering the influences of different external factors. Extensive experiments on two real-world datasets, namely TaxiBJ and HappyValley, validate the effectiveness and efficiency of our method compared to seven baselines, demonstrating the state-of-the-art performance of our approach on the fine-grained urban flow inference problem. |
Tasks | |
Published | 2019-02-06 |
URL | http://arxiv.org/abs/1902.05377v1 |
http://arxiv.org/pdf/1902.05377v1.pdf | |
PWC | https://paperswithcode.com/paper/urbanfm-inferring-fine-grained-urban-flows |
Repo | https://github.com/yoshall/UrbanFM |
Framework | pytorch |
Still a Pain in the Neck: Evaluating Text Representations on Lexical Composition
Title | Still a Pain in the Neck: Evaluating Text Representations on Lexical Composition |
Authors | Vered Shwartz, Ido Dagan |
Abstract | Building meaningful phrase representations is challenging because phrase meanings are not simply the sum of their constituent meanings. Lexical composition can shift the meanings of the constituent words and introduce implicit information. We tested a broad range of textual representations for their capacity to address these issues. We found that as expected, contextualized word representations perform better than static word embeddings, more so on detecting meaning shift than in recovering implicit information, in which their performance is still far from that of humans. Our evaluation suite, including 5 tasks related to lexical composition effects, can serve future research aiming to improve such representations. |
Tasks | Word Embeddings |
Published | 2019-02-27 |
URL | https://arxiv.org/abs/1902.10618v2 |
https://arxiv.org/pdf/1902.10618v2.pdf | |
PWC | https://paperswithcode.com/paper/still-a-pain-in-the-neck-evaluating-text |
Repo | https://github.com/vered1986/lexcomp |
Framework | none |
Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation
Title | Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation |
Authors | Zhi Tian, Tong He, Chunhua Shen, Youliang Yan |
Abstract | Recent semantic segmentation methods exploit encoder-decoder architectures to produce the desired pixel-wise segmentation prediction. The last layer of the decoders is typically a bilinear upsampling procedure to recover the final pixel-wise prediction. We empirically show that this oversimple and data-independent bilinear upsampling may lead to sub-optimal results. In this work, we propose a data-dependent upsampling (DUpsampling) to replace bilinear, which takes advantages of the redundancy in the label space of semantic segmentation and is able to recover the pixel-wise prediction from low-resolution outputs of CNNs. The main advantage of the new upsampling layer lies in that with a relatively lower-resolution feature map such as $\frac{1}{16}$ or $\frac{1}{32}$ of the input size, we can achieve even better segmentation accuracy, significantly reducing computation complexity. This is made possible by 1) the new upsampling layer’s much improved reconstruction capability; and more importantly 2) the DUpsampling based decoder’s flexibility in leveraging almost arbitrary combinations of the CNN encoders’ features. Experiments demonstrate that our proposed decoder outperforms the state-of-the-art decoder, with only $\sim$20% of computation. Finally, without any post-processing, the framework equipped with our proposed decoder achieves new state-of-the-art performance on two datasets: 88.1% mIOU on PASCAL VOC with 30% computation of the previously best model; and 52.5% mIOU on PASCAL Context. |
Tasks | Semantic Segmentation |
Published | 2019-03-05 |
URL | http://arxiv.org/abs/1903.02120v3 |
http://arxiv.org/pdf/1903.02120v3.pdf | |
PWC | https://paperswithcode.com/paper/decoders-matter-for-semantic-segmentation |
Repo | https://github.com/LinZhuoChen/DUpsampling |
Framework | pytorch |