April 3, 2020

3216 words 16 mins read

Paper Group AWR 78

Paper Group AWR 78

Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation. Nmbr9 as a Constraint Programming Challenge. Cross-Modality Paired-Images Generation for RGB-Infrared Person Re-Identification. Learning Diverse Features with Part-Level Resolution for Person Re-Identification. SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint E …

Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation

Title Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation
Authors Min-Hung Chen, Baopu Li, Yingze Bao, Ghassan AlRegib, Zsolt Kira
Abstract Despite the recent progress of fully-supervised action segmentation techniques, the performance is still not fully satisfactory. One main challenge is the problem of spatiotemporal variations (e.g. different people may perform the same activity in various ways). Therefore, we exploit unlabeled videos to address this problem by reformulating the action segmentation task as a cross-domain problem with domain discrepancy caused by spatio-temporal variations. To reduce the discrepancy, we propose Self-Supervised Temporal Domain Adaptation (SSTDA), which contains two self-supervised auxiliary tasks (binary and sequential domain prediction) to jointly align cross-domain feature spaces embedded with local and global temporal dynamics, achieving better performance than other Domain Adaptation (DA) approaches. On three challenging benchmark datasets (GTEA, 50Salads, and Breakfast), SSTDA outperforms the current state-of-the-art method by large margins (e.g. for the F1@25 score, from 59.6% to 69.1% on Breakfast, from 73.4% to 81.5% on 50Salads, and from 83.6% to 89.1% on GTEA), and requires only 65% of the labeled training data for comparable performance, demonstrating the usefulness of adapting to unlabeled target videos across variations. The source code is available at https://github.com/cmhungsteve/SSTDA.
Tasks action segmentation, Domain Adaptation
Published 2020-03-05
URL https://arxiv.org/abs/2003.02824v3
PDF https://arxiv.org/pdf/2003.02824v3.pdf
PWC https://paperswithcode.com/paper/action-segmentation-with-joint-self
Repo https://github.com/cmhungsteve/SSTDA
Framework pytorch

Nmbr9 as a Constraint Programming Challenge

Title Nmbr9 as a Constraint Programming Challenge
Authors Mikael Zayenz Lagerkvist
Abstract Modern board games are a rich source of interesting and new challenges for combinatorial problems. The game Nmbr9 is a solitaire style puzzle game using polyominoes. The rules of the game are simple to explain, but modelling the game effectively using constraint programming is hard. This abstract presents the game, contributes new generalized variants of the game suitable for benchmarking and testing, and describes a model for the presented variants. The question of the top possible score in the standard game is an open challenge.
Tasks Board Games
Published 2020-01-13
URL https://arxiv.org/abs/2001.04238v1
PDF https://arxiv.org/pdf/2001.04238v1.pdf
PWC https://paperswithcode.com/paper/nmbr9-as-a-constraint-programming-challenge
Repo https://github.com/zayenz/cp-2019-nmbr9
Framework none

Cross-Modality Paired-Images Generation for RGB-Infrared Person Re-Identification

Title Cross-Modality Paired-Images Generation for RGB-Infrared Person Re-Identification
Authors Guan-An Wang, Tianzhu Zhang. Yang Yang, Jian Cheng, Jianlong Chang, Xu Liang, Zengguang Hou
Abstract RGB-Infrared (IR) person re-identification is very challenging due to the large cross-modality variations between RGB and IR images. The key solution is to learn aligned features to the bridge RGB and IR modalities. However, due to the lack of correspondence labels between every pair of RGB and IR images, most methods try to alleviate the variations with set-level alignment by reducing the distance between the entire RGB and IR sets. However, this set-level alignment may lead to misalignment of some instances, which limits the performance for RGB-IR Re-ID. Different from existing methods, in this paper, we propose to generate cross-modality paired-images and perform both global set-level and fine-grained instance-level alignments. Our proposed method enjoys several merits. First, our method can perform set-level alignment by disentangling modality-specific and modality-invariant features. Compared with conventional methods, ours can explicitly remove the modality-specific features and the modality variation can be better reduced. Second, given cross-modality unpaired-images of a person, our method can generate cross-modality paired images from exchanged images. With them, we can directly perform instance-level alignment by minimizing distances of every pair of images. Extensive experimental results on two standard benchmarks demonstrate that the proposed model favourably against state-of-the-art methods. Especially, on SYSU-MM01 dataset, our model can achieve a gain of 9.2% and 7.7% in terms of Rank-1 and mAP. Code is available at https://github.com/wangguanan/JSIA-ReID.
Tasks Person Re-Identification
Published 2020-02-10
URL https://arxiv.org/abs/2002.04114v2
PDF https://arxiv.org/pdf/2002.04114v2.pdf
PWC https://paperswithcode.com/paper/cross-modality-paired-images-generation-for
Repo https://github.com/wangguanan/JSIA-ReID
Framework pytorch

Learning Diverse Features with Part-Level Resolution for Person Re-Identification

Title Learning Diverse Features with Part-Level Resolution for Person Re-Identification
Authors Ben Xie, Xiaofu Wu, Suofei Zhang, Shiliang Zhao, Ming Li
Abstract Learning diverse features is key to the success of person re-identification. Various part-based methods have been extensively proposed for learning local representations, which, however, are still inferior to the best-performing methods for person re-identification. This paper proposes to construct a strong lightweight network architecture, termed PLR-OSNet, based on the idea of Part-Level feature Resolution over the Omni-Scale Network (OSNet) for achieving feature diversity. The proposed PLR-OSNet has two branches, one branch for global feature representation and the other branch for local feature representation. The local branch employs a uniform partition strategy for part-level feature resolution but produces only a single identity-prediction loss, which is in sharp contrast to the existing part-based methods. Empirical evidence demonstrates that the proposed PLR-OSNet achieves state-of-the-art performance on popular person Re-ID datasets, including Market1501, DukeMTMC-reID and CUHK03, despite its small model size.
Tasks Person Re-Identification
Published 2020-01-21
URL https://arxiv.org/abs/2001.07442v1
PDF https://arxiv.org/pdf/2001.07442v1.pdf
PWC https://paperswithcode.com/paper/learning-diverse-features-with-part-level
Repo https://github.com/AI-NERC-NUPT/PLR-OSNet
Framework pytorch

SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation

Title SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation
Authors Zechen Liu, Zizhang Wu, Roland Tóth
Abstract Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving. In case of monocular vision, successful methods have been mainly based on two ingredients: (i) a network generating 2D region proposals, (ii) a R-CNN structure predicting 3D object pose by utilizing the acquired regions of interest. We argue that the 2D detection network is redundant and introduces non-negligible noise for 3D detection. Hence, we propose a novel 3D object detection method, named SMOKE, in this paper that predicts a 3D bounding box for each detected object by combining a single keypoint estimate with regressed 3D variables. As a second contribution, we propose a multi-step disentangling approach for constructing the 3D bounding box, which significantly improves both training convergence and detection accuracy. In contrast to previous 3D detection techniques, our method does not require complicated pre/post-processing, extra data, and a refinement stage. Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset, giving the best state-of-the-art result on both 3D object detection and Bird’s eye view evaluation. The code will be made publicly available.
Tasks 3D Object Detection, Autonomous Navigation, Object Detection
Published 2020-02-24
URL https://arxiv.org/abs/2002.10111v1
PDF https://arxiv.org/pdf/2002.10111v1.pdf
PWC https://paperswithcode.com/paper/smoke-single-stage-monocular-3d-object
Repo https://github.com/lzccccc/SMOKE
Framework pytorch

Deep Deterministic Portfolio Optimization

Title Deep Deterministic Portfolio Optimization
Authors Ayman Chaouki, Stephen Hardiman, Christian Schmidt, Emmanuel Sérié, Joachim de Lataillade
Abstract Can deep reinforcement learning algorithms be exploited as solvers for optimal trading strategies? The aim of this work is to test reinforcement learning algorithms on conceptually simple, but mathematically non-trivial, trading environments. The environments are chosen such that an optimal or close-to-optimal trading strategy is known. We study the deep deterministic policy gradient algorithm and show that such a reinforcement learning agent can successfully recover the essential features of the optimal trading strategies and achieve close-to-optimal rewards.
Tasks Portfolio Optimization
Published 2020-03-13
URL https://arxiv.org/abs/2003.06497v1
PDF https://arxiv.org/pdf/2003.06497v1.pdf
PWC https://paperswithcode.com/paper/deep-deterministic-portfolio-optimization
Repo https://github.com/CFMTech/Deep-RL-for-Portfolio-Optimization
Framework pytorch

FADNet: A Fast and Accurate Network for Disparity Estimation

Title FADNet: A Fast and Accurate Network for Disparity Estimation
Authors Qiang Wang, Shaohuai Shi, Shizhen Zheng, Kaiyong Zhao, Xiaowen Chu
Abstract Deep neural networks (DNNs) have achieved great success in the area of computer vision. The disparity estimation problem tends to be addressed by DNNs which achieve much better prediction accuracy in stereo matching than traditional hand-crafted feature based methods. On one hand, however, the designed DNNs require significant memory and computation resources to accurately predict the disparity, especially for those 3D convolution based networks, which makes it difficult for deployment in real-time applications. On the other hand, existing computation-efficient networks lack expression capability in large-scale datasets so that they cannot make an accurate prediction in many scenarios. To this end, we propose an efficient and accurate deep network for disparity estimation named FADNet with three main features: 1) It exploits efficient 2D based correlation layers with stacked blocks to preserve fast computation; 2) It combines the residual structures to make the deeper model easier to learn; 3) It contains multi-scale predictions so as to exploit a multi-scale weight scheduling training technique to improve the accuracy. We conduct experiments to demonstrate the effectiveness of FADNet on two popular datasets, Scene Flow and KITTI 2015. Experimental results show that FADNet achieves state-of-the-art prediction accuracy, and runs at a significant order of magnitude faster speed than existing 3D models. The codes of FADNet are available at https://github.com/HKBU-HPML/FADNet.
Tasks Disparity Estimation, Stereo Matching
Published 2020-03-24
URL https://arxiv.org/abs/2003.10758v1
PDF https://arxiv.org/pdf/2003.10758v1.pdf
PWC https://paperswithcode.com/paper/fadnet-a-fast-and-accurate-network-for
Repo https://github.com/HKBU-HPML/FADNet
Framework pytorch

Video Saliency Prediction Using Enhanced Spatiotemporal Alignment Network

Title Video Saliency Prediction Using Enhanced Spatiotemporal Alignment Network
Authors Jin Chen, Huihui Song, Kaihua Zhang, Bo Liu, Qingshan Liu
Abstract Due to a variety of motions across different frames, it is highly challenging to learn an effective spatiotemporal representation for accurate video saliency prediction (VSP). To address this issue, we develop an effective spatiotemporal feature alignment network tailored to VSP, mainly including two key sub-networks: a multi-scale deformable convolutional alignment network (MDAN) and a bidirectional convolutional Long Short-Term Memory (Bi-ConvLSTM) network. The MDAN learns to align the features of the neighboring frames to the reference one in a coarse-to-fine manner, which can well handle various motions. Specifically, the MDAN owns a pyramidal feature hierarchy structure that first leverages deformable convolution (Dconv) to align the lower-resolution features across frames, and then aggregates the aligned features to align the higher-resolution features, progressively enhancing the features from top to bottom. The output of MDAN is then fed into the Bi-ConvLSTM for further enhancement, which captures the useful long-time temporal information along forward and backward timing directions to effectively guide attention orientation shift prediction under complex scene transformation. Finally, the enhanced features are decoded to generate the predicted saliency map. The proposed model is trained end-to-end without any intricate post processing. Extensive evaluations on four VSP benchmark datasets demonstrate that the proposed method achieves favorable performance against state-of-the-art methods. The source codes and all the results will be released.
Tasks Saliency Prediction
Published 2020-01-02
URL https://arxiv.org/abs/2001.00292v1
PDF https://arxiv.org/pdf/2001.00292v1.pdf
PWC https://paperswithcode.com/paper/video-saliency-prediction-using-enhanced
Repo https://github.com/cj4L/ESAN-VSP
Framework pytorch

Can x2vec Save Lives? Integrating Graph and Language Embeddings for Automatic Mental Health Classification

Title Can x2vec Save Lives? Integrating Graph and Language Embeddings for Automatic Mental Health Classification
Authors Alexander Ruch
Abstract Graph and language embedding models are becoming commonplace in large scale analyses given their ability to represent complex sparse data densely in low-dimensional space. Integrating these models’ complementary relational and communicative data may be especially helpful if predicting rare events or classifying members of hidden populations - tasks requiring huge and sparse datasets for generalizable analyses. For example, due to social stigma and comorbidities, mental health support groups often form in amorphous online groups. Predicting suicidality among individuals in these settings using standard network analyses is prohibitive due to resource limits (e.g., memory), and adding auxiliary data like text to such models exacerbates complexity- and sparsity-related issues. Here, I show how merging graph and language embedding models (metapath2vec and doc2vec) avoids these limits and extracts unsupervised clustering data without domain expertise or feature engineering. Graph and language distances to a suicide support group have little correlation (\r{ho} < 0.23), implying the two models are not embedding redundant information. When used separately to predict suicidality among individuals, graph and language data generate relatively accurate results (69% and 76%, respectively); however, when integrated, both data produce highly accurate predictions (90%, with 10% false-positives and 12% false-negatives). Visualizing graph embeddings annotated with predictions of potentially suicidal individuals shows the integrated model could classify such individuals even if they are positioned far from the support group. These results extend research on the importance of simultaneously analyzing behavior and language in massive networks and efforts to integrate embedding models for different kinds of data when predicting and classifying, particularly when they involve rare events.
Tasks Action Classification, Activity Prediction, Document Embedding, Feature Engineering, Graph Embedding
Published 2020-01-04
URL https://arxiv.org/abs/2001.01126v1
PDF https://arxiv.org/pdf/2001.01126v1.pdf
PWC https://paperswithcode.com/paper/can-x2vec-save-lives-integrating-graph-and
Repo https://github.com/AlexMRuch/Can-x2vec-Save-Lives
Framework none

Unifying and generalizing models of neural dynamics during decision-making

Title Unifying and generalizing models of neural dynamics during decision-making
Authors David M. Zoltowski, Jonathan W. Pillow, Scott W. Linderman
Abstract An open question in systems and computational neuroscience is how neural circuits accumulate evidence towards a decision. Fitting models of decision-making theory to neural activity helps answer this question, but current approaches limit the number of these models that we can fit to neural data. Here we propose a unifying framework for modeling neural activity during decision-making tasks. The framework includes the canonical drift-diffusion model and enables extensions such as multi-dimensional accumulators, variable and collapsing boundaries, and discrete jumps. Our framework is based on constraining the parameters of recurrent state-space models, for which we introduce a scalable variational Laplace-EM inference algorithm. We applied the modeling approach to spiking responses recorded from monkey parietal cortex during two decision-making tasks. We found that a two-dimensional accumulator better captured the trial-averaged responses of a set of parietal neurons than a single accumulator model. Next, we identified a variable lower boundary in the responses of an LIP neuron during a random dot motion task.
Tasks Decision Making
Published 2020-01-13
URL https://arxiv.org/abs/2001.04571v1
PDF https://arxiv.org/pdf/2001.04571v1.pdf
PWC https://paperswithcode.com/paper/unifying-and-generalizing-models-of-neural
Repo https://github.com/davidzoltowski/ssmdm
Framework none

NeurIPS 2019 Disentanglement Challenge: Improved Disentanglement through Aggregated Convolutional Feature Maps

Title NeurIPS 2019 Disentanglement Challenge: Improved Disentanglement through Aggregated Convolutional Feature Maps
Authors Maximilian Seitzer
Abstract This report to our stage 1 submission to the NeurIPS 2019 disentanglement challenge presents a simple image preprocessing method for training VAEs leading to improved disentanglement compared to directly using the images. In particular, we propose to use regionally aggregated feature maps extracted from CNNs pretrained on ImageNet. Our method achieved the 2nd place in stage 1 of the challenge. Code is available at https://github.com/mseitzer/neurips2019-disentanglement-challenge.
Tasks
Published 2020-02-23
URL https://arxiv.org/abs/2002.10003v1
PDF https://arxiv.org/pdf/2002.10003v1.pdf
PWC https://paperswithcode.com/paper/neurips-2019-disentanglement-challenge
Repo https://github.com/mseitzer/neurips2019-disentanglement-challenge
Framework pytorch

CrossWOZ: A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset

Title CrossWOZ: A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset
Authors Qi Zhu, Kaili Huang, Zheng Zhang, Xiaoyan Zhu, Minlie Huang
Abstract To advance multi-domain (cross-domain) dialogue modeling as well as alleviate the shortage of Chinese task-oriented datasets, we propose CrossWOZ, the first large-scale Chinese Cross-Domain Wizard-of-Oz task-oriented dataset. It contains 6K dialogue sessions and 102K utterances for 5 domains, including hotel, restaurant, attraction, metro, and taxi. Moreover, the corpus contains rich annotation of dialogue states and dialogue acts at both user and system sides. About 60% of the dialogues have cross-domain user goals that favor inter-domain dependency and encourage natural transition across domains in conversation. We also provide a user simulator and several benchmark models for pipelined task-oriented dialogue systems, which will facilitate researchers to compare and evaluate their models on this corpus. The large size and rich annotation of CrossWOZ make it suitable to investigate a variety of tasks in cross-domain dialogue modeling, such as dialogue state tracking, policy learning, user simulation, etc.
Tasks Dialogue State Tracking, Task-Oriented Dialogue Systems
Published 2020-02-27
URL https://arxiv.org/abs/2002.11893v2
PDF https://arxiv.org/pdf/2002.11893v2.pdf
PWC https://paperswithcode.com/paper/crosswoz-a-large-scale-chinese-cross-domain
Repo https://github.com/sz128/NLU_datasets_for_task_oriented_dialogue
Framework pytorch

End-to-end semantic segmentation of personalized deep brain structures for non-invasive brain stimulation

Title End-to-end semantic segmentation of personalized deep brain structures for non-invasive brain stimulation
Authors Essam A. Rashed, Jose Gomez-Tames, Akimasa Hirata
Abstract Electro-stimulation or modulation of deep brain regions is commonly used in clinical procedures for the treatment of several nervous system disorders. In particular, transcranial direct current stimulation (tDCS) is widely used as an affordable clinical application that is applied through electrodes attached to the scalp. However, it is difficult to determine the amount and distribution of the electric field (EF) in the different brain regions due to anatomical complexity and high inter-subject variability. Personalized tDCS is an emerging clinical procedure that is used to tolerate electrode montage for accurate targeting. This procedure is guided by computational head models generated from anatomical images such as MRI. Distribution of the EF in segmented head models can be calculated through simulation studies. Therefore, fast, accurate, and feasible segmentation of different brain structures would lead to a better adjustment for customized tDCS studies. In this study, a single-encoder multi-decoders convolutional neural network is proposed for deep brain segmentation. The proposed architecture is trained to segment seven deep brain structures using T1-weighted MRI. Network generated models are compared with a reference model constructed using a semi-automatic method, and it presents a high matching especially in Thalamus (Dice Coefficient (DC) = 94.70%), Caudate (DC = 91.98%) and Putamen (DC = 90.31%) structures. Electric field distribution during tDCS in generated and reference models matched well each other, suggesting its potential usefulness in clinical practice.
Tasks Brain Segmentation, Semantic Segmentation
Published 2020-02-13
URL https://arxiv.org/abs/2002.05487v1
PDF https://arxiv.org/pdf/2002.05487v1.pdf
PWC https://paperswithcode.com/paper/end-to-end-semantic-segmentation-of
Repo https://github.com/erashed/SubForkNet
Framework none

Unsupervised Bidirectional Cross-Modality Adaptation via Deeply Synergistic Image and Feature Alignment for Medical Image Segmentation

Title Unsupervised Bidirectional Cross-Modality Adaptation via Deeply Synergistic Image and Feature Alignment for Medical Image Segmentation
Authors Cheng Chen, Qi Dou, Hao Chen, Jing Qin, Pheng Ann Heng
Abstract Unsupervised domain adaptation has increasingly gained interest in medical image computing, aiming to tackle the performance degradation of deep neural networks when being deployed to unseen data with heterogeneous characteristics. In this work, we present a novel unsupervised domain adaptation framework, named as Synergistic Image and Feature Alignment (SIFA), to effectively adapt a segmentation network to an unlabeled target domain. Our proposed SIFA conducts synergistic alignment of domains from both image and feature perspectives. In particular, we simultaneously transform the appearance of images across domains and enhance domain-invariance of the extracted features by leveraging adversarial learning in multiple aspects and with a deeply supervised mechanism. The feature encoder is shared between both adaptive perspectives to leverage their mutual benefits via end-to-end learning. We have extensively evaluated our method with cardiac substructure segmentation and abdominal multi-organ segmentation for bidirectional cross-modality adaptation between MRI and CT images. Experimental results on two different tasks demonstrate that our SIFA method is effective in improving segmentation performance on unlabeled target images, and outperforms the state-of-the-art domain adaptation approaches by a large margin.
Tasks Domain Adaptation, Medical Image Segmentation, Semantic Segmentation, Unsupervised Domain Adaptation
Published 2020-02-06
URL https://arxiv.org/abs/2002.02255v1
PDF https://arxiv.org/pdf/2002.02255v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-bidirectional-cross-modality
Repo https://github.com/cchen-cc/SIFA
Framework tf

OPFython: A Python-Inspired Optimum-Path Forest Classifier

Title OPFython: A Python-Inspired Optimum-Path Forest Classifier
Authors Gustavo Henrique de Rosa, João Paulo Papa, Alexandre Xavier Falcão
Abstract Machine learning techniques have been paramount throughout the last years, being applied in a wide range of tasks, such as classification, object recognition, person identification, image segmentation, among others. Nevertheless, conventional classification algorithms, e.g., Logistic Regression, Decision Trees, Bayesian classifiers, might lack complexity and diversity, not being suitable when dealing with real-world data. A recent graph-inspired classifier, known as the Optimum-Path Forest, has proven to be a state-of-the-art technique, comparable to Support Vector Machines and even surpassing it in some tasks. In this paper, we propose a Python-based Optimum-Path Forest framework, denoted as OPFython, where all of its functions and classes are based upon the original C language implementation. Additionally, as OPFython is a Python-based library, it provides a more friendly environment and a faster prototyping workspace than the C language.
Tasks Object Recognition, Person Identification, Semantic Segmentation
Published 2020-01-28
URL https://arxiv.org/abs/2001.10420v1
PDF https://arxiv.org/pdf/2001.10420v1.pdf
PWC https://paperswithcode.com/paper/opfython-a-python-inspired-optimum-path
Repo https://github.com/gugarosa/opfython
Framework none
comments powered by Disqus