Paper Group ANR 673
Cross-Modal Attentional Context Learning for RGB-D Object Detection. A Many-Objective Evolutionary Algorithm With Two Interacting Processes: Cascade Clustering and Reference Point Incremental Learning. DLHub: Model and Data Serving for Science. Canonical Correlation Analysis for Misaligned Satellite Image Change Detection. Coronary Calcium Detectio …
Cross-Modal Attentional Context Learning for RGB-D Object Detection
Title | Cross-Modal Attentional Context Learning for RGB-D Object Detection |
Authors | Guanbin Li, Yukang Gan, Hejun Wu, Nong Xiao, Liang Lin |
Abstract | Recognizing objects from simultaneously sensed photometric (RGB) and depth channels is a fundamental yet practical problem in many machine vision applications such as robot grasping and autonomous driving. In this paper, we address this problem by developing a Cross-Modal Attentional Context (CMAC) learning framework, which enables the full exploitation of the context information from both RGB and depth data. Compared to existing RGB-D object detection frameworks, our approach has several appealing properties. First, it consists of an attention-based global context model for exploiting adaptive contextual information and incorporating this information into a region-based CNN (e.g., Fast RCNN) framework to achieve improved object detection performance. Second, our CMAC framework further contains a fine-grained object part attention module to harness multiple discriminative object parts inside each possible object region for superior local feature representation. While greatly improving the accuracy of RGB-D object detection, the effective cross-modal information fusion as well as attentional context modeling in our proposed model provide an interpretable visualization scheme. Experimental results demonstrate that the proposed method significantly improves upon the state of the art on all public benchmarks. |
Tasks | Autonomous Driving, Object Detection |
Published | 2018-10-30 |
URL | http://arxiv.org/abs/1810.12829v1 |
http://arxiv.org/pdf/1810.12829v1.pdf | |
PWC | https://paperswithcode.com/paper/cross-modal-attentional-context-learning-for |
Repo | |
Framework | |
A Many-Objective Evolutionary Algorithm With Two Interacting Processes: Cascade Clustering and Reference Point Incremental Learning
Title | A Many-Objective Evolutionary Algorithm With Two Interacting Processes: Cascade Clustering and Reference Point Incremental Learning |
Authors | Hongwei Ge, Mingde Zhao, Liang Sun, Zhen Wang, Guozhen Tan, Qiang Zhang, C. L. Philip Chen |
Abstract | Researches have shown difficulties in obtaining proximity while maintaining diversity for many-objective optimization problems. Complexities of the true Pareto front pose challenges for the reference vector-based algorithms for their insufficient adaptability to the diverse characteristics with no priori. This paper proposes a many-objective optimization algorithm with two interacting processes: cascade clustering and reference point incremental learning (CLIA). In the population selection process based on cascade clustering (CC), using the reference vectors provided by the process based on incremental learning, the nondominated and the dominated individuals are clustered and sorted with different manners in a cascade style and are selected by round-robin for better proximity and diversity. In the reference vector adaptation process based on reference point incremental learning, using the feedbacks from the process based on CC, proper distribution of reference points is gradually obtained by incremental learning. Experimental studies on several benchmark problems show that CLIA is competitive compared with the state-of-the-art algorithms and has impressive efficiency and versatility using only the interactions between the two processes without incurring extra evaluations. |
Tasks | |
Published | 2018-03-03 |
URL | https://arxiv.org/abs/1803.01097v4 |
https://arxiv.org/pdf/1803.01097v4.pdf | |
PWC | https://paperswithcode.com/paper/a-many-objective-evolutionary-algorithm-with |
Repo | |
Framework | |
DLHub: Model and Data Serving for Science
Title | DLHub: Model and Data Serving for Science |
Authors | Ryan Chard, Zhuozhao Li, Kyle Chard, Logan Ward, Yadu Babuji, Anna Woodard, Steve Tuecke, Ben Blaiszik, Michael J. Franklin, Ian Foster |
Abstract | While the Machine Learning (ML) landscape is evolving rapidly, there has been a relative lag in the development of the “learning systems” needed to enable broad adoption. Furthermore, few such systems are designed to support the specialized requirements of scientific ML. Here we present the Data and Learning Hub for science (DLHub), a multi-tenant system that provides both model repository and serving capabilities with a focus on science applications. DLHub addresses two significant shortcomings in current systems. First, its selfservice model repository allows users to share, publish, verify, reproduce, and reuse models, and addresses concerns related to model reproducibility by packaging and distributing models and all constituent components. Second, it implements scalable and low-latency serving capabilities that can leverage parallel and distributed computing resources to democratize access to published models through a simple web interface. Unlike other model serving frameworks, DLHub can store and serve any Python 3-compatible model or processing function, plus multiple-function pipelines. We show that relative to other model serving systems including TensorFlow Serving, SageMaker, and Clipper, DLHub provides greater capabilities, comparable performance without memoization and batching, and significantly better performance when the latter two techniques can be employed. We also describe early uses of DLHub for scientific applications. |
Tasks | |
Published | 2018-11-27 |
URL | http://arxiv.org/abs/1811.11213v1 |
http://arxiv.org/pdf/1811.11213v1.pdf | |
PWC | https://paperswithcode.com/paper/dlhub-model-and-data-serving-for-science |
Repo | |
Framework | |
Canonical Correlation Analysis for Misaligned Satellite Image Change Detection
Title | Canonical Correlation Analysis for Misaligned Satellite Image Change Detection |
Authors | Hichem Sahbi |
Abstract | Canonical correlation analysis (CCA) is a statistical learning method that seeks to build view-independent latent representations from multi-view data. This method has been successfully applied to several pattern analysis tasks such as image-to-text mapping and view-invariant object/action recognition. However, this success is highly dependent on the quality of data pairing (i.e., alignments) and mispairing adversely affects the generalization ability of the learned CCA representations. In this paper, we address the issue of alignment errors using a new variant of canonical correlation analysis referred to as alignment-agnostic (AA) CCA. Starting from erroneously paired data taken from different views, this CCA finds transformation matrices by optimizing a constrained maximization problem that mixes a data correlation term with context regularization; the particular design of these two terms mitigates the effect of alignment errors when learning the CCA transformations. Experiments conducted on multi-view tasks, including multi-temporal satellite image change detection, show that our AA CCA method is highly effective and resilient to mispairing errors. |
Tasks | Temporal Action Localization |
Published | 2018-12-21 |
URL | http://arxiv.org/abs/1812.09280v1 |
http://arxiv.org/pdf/1812.09280v1.pdf | |
PWC | https://paperswithcode.com/paper/canonical-correlation-analysis-for-misaligned |
Repo | |
Framework | |
Coronary Calcium Detection using 3D Attention Identical Dual Deep Network Based on Weakly Supervised Learning
Title | Coronary Calcium Detection using 3D Attention Identical Dual Deep Network Based on Weakly Supervised Learning |
Authors | Yuankai Huo, James G. Terry, Jiachen Wang, Vishwesh Nath, Camilo Bermudez, Shunxing Bao, Prasanna Parvathaneni, J. Jeffery Carr, Bennett A. Landman |
Abstract | Coronary artery calcium (CAC) is biomarker of advanced subclinical coronary artery disease and predicts myocardial infarction and death prior to age 60 years. The slice-wise manual delineation has been regarded as the gold standard of coronary calcium detection. However, manual efforts are time and resource consuming and even impracticable to be applied on large-scale cohorts. In this paper, we propose the attention identical dual network (AID-Net) to perform CAC detection using scan-rescan longitudinal non-contrast CT scans with weakly supervised attention by only using per scan level labels. To leverage the performance, 3D attention mechanisms were integrated into the AID-Net to provide complementary information for classification tasks. Moreover, the 3D Gradient-weighted Class Activation Mapping (Grad-CAM) was also proposed at the testing stage to interpret the behaviors of the deep neural network. 5075 non-contrast chest CT scans were used as training, validation and testing datasets. Baseline performance was assessed on the same cohort. From the results, the proposed AID-Net achieved the superior performance on classification accuracy (0.9272) and AUC (0.9627). |
Tasks | |
Published | 2018-11-10 |
URL | http://arxiv.org/abs/1811.04289v1 |
http://arxiv.org/pdf/1811.04289v1.pdf | |
PWC | https://paperswithcode.com/paper/coronary-calcium-detection-using-3d-attention |
Repo | |
Framework | |
StainGAN: Stain Style Transfer for Digital Histological Images
Title | StainGAN: Stain Style Transfer for Digital Histological Images |
Authors | M Tarek Shaban, Christoph Baur, Nassir Navab, Shadi Albarqouni |
Abstract | Digitized Histological diagnosis is in increasing demand. However, color variations due to various factors are imposing obstacles to the diagnosis process. The problem of stain color variations is a well-defined problem with many proposed solutions. Most of these solutions are highly dependent on a reference template slide. We propose a deep-learning solution inspired by CycleGANs that is trained end-to-end, eliminating the need for an expert to pick a representative reference slide. Our approach showed superior results quantitatively and qualitatively against the state of the art methods (10% improvement visually using SSIM). We further validated our method on a clinical use-case, namely Breast Cancer tumor classification, showing 12% increase in AUC. The code will be made publicly available. |
Tasks | Style Transfer |
Published | 2018-04-04 |
URL | http://arxiv.org/abs/1804.01601v1 |
http://arxiv.org/pdf/1804.01601v1.pdf | |
PWC | https://paperswithcode.com/paper/staingan-stain-style-transfer-for-digital |
Repo | |
Framework | |
Active Semi-supervised Transfer Learning (ASTL) for Offline BCI Calibration
Title | Active Semi-supervised Transfer Learning (ASTL) for Offline BCI Calibration |
Authors | Dongrui Wu |
Abstract | Single-trial classification of event-related potentials in electroencephalogram (EEG) signals is a very important paradigm of brain-computer interface (BCI). Because of individual differences, usually some subject-specific calibration data are required to tailor the classifier for each subject. Transfer learning has been extensively used to reduce such calibration data requirement, by making use of auxiliary data from similar/relevant subjects/tasks. However, all previous research assumes that all auxiliary data have been labeled. This paper considers a more general scenario, in which part of the auxiliary data could be unlabeled. We propose active semi-supervised transfer learning (ASTL) for offline BCI calibration, which integrates active learning, semi-supervised learning, and transfer learning. Using a visual evoked potential oddball task and three different EEG headsets, we demonstrate that ASTL can achieve consistently good performance across subjects and headsets, and it outperforms some state-of-the-art approaches in the literature. |
Tasks | Active Learning, Calibration, EEG, Transfer Learning |
Published | 2018-05-12 |
URL | http://arxiv.org/abs/1805.05781v1 |
http://arxiv.org/pdf/1805.05781v1.pdf | |
PWC | https://paperswithcode.com/paper/active-semi-supervised-transfer-learning-astl |
Repo | |
Framework | |
Less Is More: Picking Informative Frames for Video Captioning
Title | Less Is More: Picking Informative Frames for Video Captioning |
Authors | Yangyu Chen, Shuhui Wang, Weigang Zhang, Qingming Huang |
Abstract | In video captioning task, the best practice has been achieved by attention-based models which associate salient visual components with sentences in the video. However, existing study follows a common procedure which includes a frame-level appearance modeling and motion modeling on equal interval frame sampling, which may bring about redundant visual information, sensitivity to content noise and unnecessary computation cost. We propose a plug-and-play PickNet to perform informative frame picking in video captioning. Based on a standard Encoder-Decoder framework, we develop a reinforcement-learning-based procedure to train the network sequentially, where the reward of each frame picking action is designed by maximizing visual diversity and minimizing textual discrepancy. If the candidate is rewarded, it will be selected and the corresponding latent representation of Encoder-Decoder will be updated for future trials. This procedure goes on until the end of the video sequence. Consequently, a compact frame subset can be selected to represent the visual information and perform video captioning without performance degradation. Experiment results shows that our model can use 6-8 frames to achieve competitive performance across popular benchmarks. |
Tasks | Video Captioning |
Published | 2018-03-05 |
URL | http://arxiv.org/abs/1803.01457v1 |
http://arxiv.org/pdf/1803.01457v1.pdf | |
PWC | https://paperswithcode.com/paper/less-is-more-picking-informative-frames-for |
Repo | |
Framework | |
Learnable PINs: Cross-Modal Embeddings for Person Identity
Title | Learnable PINs: Cross-Modal Embeddings for Person Identity |
Authors | Arsha Nagrani, Samuel Albanie, Andrew Zisserman |
Abstract | We propose and investigate an identity sensitive joint embedding of face and voice. Such an embedding enables cross-modal retrieval from voice to face and from face to voice. We make the following four contributions: first, we show that the embedding can be learnt from videos of talking faces, without requiring any identity labels, using a form of cross-modal self-supervision; second, we develop a curriculum learning schedule for hard negative mining targeted to this task, that is essential for learning to proceed successfully; third, we demonstrate and evaluate cross-modal retrieval for identities unseen and unheard during training over a number of scenarios and establish a benchmark for this novel task; finally, we show an application of using the joint embedding for automatically retrieving and labelling characters in TV dramas. |
Tasks | Cross-Modal Retrieval |
Published | 2018-05-02 |
URL | http://arxiv.org/abs/1805.00833v2 |
http://arxiv.org/pdf/1805.00833v2.pdf | |
PWC | https://paperswithcode.com/paper/learnable-pins-cross-modal-embeddings-for |
Repo | |
Framework | |
Total Variation with Overlapping Group Sparsity and Lp Quasinorm for Infrared Image Deblurring under Salt-and-Pepper Noise
Title | Total Variation with Overlapping Group Sparsity and Lp Quasinorm for Infrared Image Deblurring under Salt-and-Pepper Noise |
Authors | Xingguo Liu, Yinping Chen, Zhenming Peng, Juan Wu |
Abstract | Because of the limitations of the infrared imaging principle and the properties of infrared imaging systems, infrared images have some drawbacks, including a lack of details, indistinct edges, and a large amount of salt-andpepper noise. To improve the sparse characteristics of the image while maintaining the image edges and weakening staircase artifacts, this paper proposes a method that uses the Lp quasinorm instead of the L1 norm and for infrared image deblurring with an overlapping group sparse total variation method. The Lp quasinorm introduces another degree of freedom, better describes image sparsity characteristics, and improves image restoration. Furthermore, we adopt the accelerated alternating direction method of multipliers and fast Fourier transform theory in the proposed method to improve the efficiency and robustness of our algorithm. Experiments show that under different conditions for blur and salt-and-pepper noise, the proposed method leads to excellent performance in terms of objective evaluation and subjective visual results. |
Tasks | Deblurring, Image Restoration |
Published | 2018-12-31 |
URL | http://arxiv.org/abs/1812.11725v2 |
http://arxiv.org/pdf/1812.11725v2.pdf | |
PWC | https://paperswithcode.com/paper/total-variation-with-overlapping-group-1 |
Repo | |
Framework | |
Soft-Autoencoder and Its Wavelet Shrinkage Interpretation
Title | Soft-Autoencoder and Its Wavelet Shrinkage Interpretation |
Authors | Fenglei Fan, Mengzhou Li, Yueyang Teng, Ge Wang |
Abstract | Recently, deep learning becomes the main focus of machine learning research and has greatly impacted many fields. However, deep learning is criticized for lack of interpretability. As a successful unsupervised model in deep learning, the autoencoder embraces a wide spectrum of applications, yet it suffers from the model opaqueness as well. In this paper, we propose a new type of convolutional autoencoders, termed as Soft-Autoencoder (Soft-AE), in which the activation functions of encoding layers are implemented with adaptable soft-thresholding units while decoding layers are realized with linear units. Consequently, Soft-AE can be naturally interpreted as a learned cascaded wavelet shrinkage system. Our denoising experiments demonstrate that Soft-AE not only is interpretable but also offers a competitive performance relative to its counterparts. Furthermore, we propose a generalized linear unit (GeLU) and its truncated variant (tGeLU) to allow autoencoder for more tasks from denoising to deblurring. |
Tasks | Deblurring, Denoising |
Published | 2018-12-31 |
URL | https://arxiv.org/abs/1812.11675v2 |
https://arxiv.org/pdf/1812.11675v2.pdf | |
PWC | https://paperswithcode.com/paper/soft-autoencoder-and-its-wavelet-shrinkage |
Repo | |
Framework | |
irbasis: Open-source database and software for intermediate-representation basis functions of imaginary-time Green’s function
Title | irbasis: Open-source database and software for intermediate-representation basis functions of imaginary-time Green’s function |
Authors | Naoya Chikano, Kazuyoshi Yoshimi, Junya Otsuki, Hiroshi Shinaoka |
Abstract | The open-source library, irbasis, provides easy-to-use tools for two sets of orthogonal functions named intermediate representation (IR). The IR basis enables a compact representation of the Matsubara Green’s function and efficient calculations of quantum models. The IR basis functions are defined as the solution of an integral equation whose analytical solution is not available for this moment. The library consists of a database of pre-computed high-precision numerical solutions and computational code for evaluating the functions from the database. This paper describes technical details and demonstrates how to use the library. |
Tasks | |
Published | 2018-07-13 |
URL | http://arxiv.org/abs/1807.05237v2 |
http://arxiv.org/pdf/1807.05237v2.pdf | |
PWC | https://paperswithcode.com/paper/irbasis-open-source-database-and-software-for |
Repo | |
Framework | |
Adaptive Recurrent Neural Network Based on Mixture Layer
Title | Adaptive Recurrent Neural Network Based on Mixture Layer |
Authors | Kui Zhao, Yuechuan Li, Chi Zhang, Cheng Yang, Huan Xu |
Abstract | Although Recurrent Neural Network (RNN) has been a powerful tool for modeling sequential data, its performance is inadequate when processing sequences with multiple patterns. In this paper, we address this challenge by introducing a novel mixture layer and constructing an adaptive RNN. The mixture layer augmented RNN (termed as M-RNN) partitions patterns in training sequences into several clusters and stores the principle patterns as prototype vectors of components in a mixture model. By leveraging the mixture layer, the proposed method can adaptively update states according to the similarities between encoded inputs and prototype vectors, leading to a stronger capacity in assimilating sequences with multiple patterns. Moreover, our approach can be further extended by taking advantage of prior knowledge about data. Experiments on both synthetic and real datasets demonstrate the effectiveness of the proposed method. |
Tasks | |
Published | 2018-01-24 |
URL | http://arxiv.org/abs/1801.08094v4 |
http://arxiv.org/pdf/1801.08094v4.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-recurrent-neural-network-based-on |
Repo | |
Framework | |
An octree cells occupancy geometric dimensionality descriptor for massive on-server point cloud visualisation and classification
Title | An octree cells occupancy geometric dimensionality descriptor for massive on-server point cloud visualisation and classification |
Authors | Remi Cura, Julien Perret, Nicolas Paparoditis |
Abstract | Lidar datasets are becoming more and more common. They are appreciated for their precise 3D nature, and have a wide range of applications, such as surface reconstruction, object detection, visualisation, etc. For all this applications, having additional semantic information per point has potential of increasing the quality and the efficiency of the application. In the last decade the use of Machine Learning and more specifically classification methods have proved to be successful to create this semantic information. In this paradigm, the goal is to classify points into a set of given classes (for instance tree, building, ground, other). Some of these methods use descriptors (also called feature) of a point to learn and predict its class. Designing the descriptors is then the heart of these methods. Descriptors can be based on points geometry and attributes, use contextual information, etc. Furthermore, descriptors can be used by humans for easier visual understanding and sometimes filtering. In this work we propose a new simple geometric descriptor that gives information about the implicit local dimensionality of the point cloud at various scale. For instance a tree seen from afar is more volumetric in nature (3D), yet locally each leaves is rather planar (2D). To do so we build an octree centred on the point to consider, and compare the variation of the occupancy of the cells across the levels of the octree. We compare this descriptor with the state of the art dimensionality descriptor and show its interest. We further test the descriptor for classification within the Point Cloud Server, and demonstrate efficiency and correctness results. |
Tasks | Object Detection |
Published | 2018-01-15 |
URL | http://arxiv.org/abs/1801.05038v1 |
http://arxiv.org/pdf/1801.05038v1.pdf | |
PWC | https://paperswithcode.com/paper/an-octree-cells-occupancy-geometric |
Repo | |
Framework | |
Transform-Based Multilinear Dynamical System for Tensor Time Series Analysis
Title | Transform-Based Multilinear Dynamical System for Tensor Time Series Analysis |
Authors | Weijun Lu, Xiao-Yang Liu, Qingwei Wu, Yue Sun, Anwar Walid |
Abstract | We propose a novel multilinear dynamical system (MLDS) in a transform domain, named $\mathcal{L}$-MLDS, to model tensor time series. With transformations applied to a tensor data, the latent multidimensional correlations among the frontal slices are built, and thus resulting in the computational independence in the transform domain. This allows the exact separability of the multi-dimensional problem into multiple smaller LDS problems. To estimate the system parameters, we utilize the expectation-maximization (EM) algorithm to determine the parameters of each LDS. Further, $\mathcal{L}$-MLDSs significantly reduce the model parameters and allows parallel processing. Our general $\mathcal{L}$-MLDS model is implemented based on different transforms: discrete Fourier transform, discrete cosine transform and discrete wavelet transform. Due to the nonlinearity of these transformations, $\mathcal{L}$-MLDS is able to capture the nonlinear correlations within the data unlike the MLDS \cite{rogers2013multilinear} which assumes multi-way linear correlations. Using four real datasets, the proposed $\mathcal{L}$-MLDS is shown to achieve much higher prediction accuracy than the state-of-the-art MLDS and LDS with an equal number of parameters under different noise models. In particular, the relative errors are reduced by $50% \sim 99%$. Simultaneously, $\mathcal{L}$-MLDS achieves an exponential improvement in the model’s training time than MLDS. |
Tasks | Time Series, Time Series Analysis |
Published | 2018-11-18 |
URL | http://arxiv.org/abs/1811.07342v1 |
http://arxiv.org/pdf/1811.07342v1.pdf | |
PWC | https://paperswithcode.com/paper/transform-based-multilinear-dynamical-system |
Repo | |
Framework | |