January 31, 2020

3272 words 16 mins read

Paper Group ANR 20

A GLCM Embedded CNN Strategy for Computer-aided Diagnosis in Intracerebral Hemorrhage. Multilingual Dialogue Generation with Shared-Private Memory. Progressive Cluster Purification for Transductive Few-shot Learning. On the Generalization Properties of Minimum-norm Solutions for Over-parameterized Neural Network Models. Learning Hybrid Representati …

A GLCM Embedded CNN Strategy for Computer-aided Diagnosis in Intracerebral Hemorrhage


Title	A GLCM Embedded CNN Strategy for Computer-aided Diagnosis in Intracerebral Hemorrhage
Authors	Yifan Hu, Yefeng Zheng
Abstract	Computer-aided diagnosis (CADx) systems have been shown to assist radiologists by providing classifications of all kinds of medical images like Computed tomography (CT) and Magnetic resonance (MR). Currently, convolutional neural networks play an important role in CADx. However, since CNN model should have a square-like input, it is usually difficult to directly apply the CNN algorithms on the irregular segmentation region of interests (ROIs) where the radiologists are interested in. In this paper, we propose a new approach to construct the model by extracting and converting the information of the irregular region into a fixed-size Gray-Level Co-Occurrence Matrix (GLCM) and then utilize the GLCM as one input of our CNN model. In this way, as an useful implementary to the original CNN, a couple of GLCM-based features are also extracted by CNN. Meanwhile, the network will pay more attention to the important lesion area and achieve a higher accuracy in classification. Experiments are performed on three classification databases: Hemorrhage, BraTS18 and Cervix to validate the universality of our innovative model. In conclusion, the proposed framework outperforms the corresponding state-of-art algorithms on each database with both test losses and classification accuracy as the evaluation criteria.
Tasks	Computed Tomography (CT)
Published	2019-06-05
URL	https://arxiv.org/abs/1906.02040v1
PDF	https://arxiv.org/pdf/1906.02040v1.pdf
PWC	https://paperswithcode.com/paper/a-glcm-embedded-cnn-strategy-for-computer
Repo
Framework

Multilingual Dialogue Generation with Shared-Private Memory


Title	Multilingual Dialogue Generation with Shared-Private Memory
Authors	Chen Chen, Lisong Qiu, Zhenxin Fu, Dongyan Zhao, Junfei Liu, Rui Yan
Abstract	Existing dialog systems are all monolingual, where features shared among different languages are rarely explored. In this paper, we introduce a novel multilingual dialogue system. Specifically, we augment the sequence to sequence framework with improved shared-private memory. The shared memory learns common features among different languages and facilitates a cross-lingual transfer to boost dialogue systems, while the private memory is owned by each separate language to capture its unique feature. Experiments conducted on Chinese and English conversation corpora of different scales show that our proposed architecture outperforms the individually learned model with the help of the other language, where the improvement is particularly distinct when the training data is limited.
Tasks	Cross-Lingual Transfer, Dialogue Generation
Published	2019-10-06
URL	https://arxiv.org/abs/1910.02365v1
PDF	https://arxiv.org/pdf/1910.02365v1.pdf
PWC	https://paperswithcode.com/paper/multilingual-dialogue-generation-with-shared
Repo
Framework

Progressive Cluster Purification for Transductive Few-shot Learning


Title	Progressive Cluster Purification for Transductive Few-shot Learning
Authors	Chenyang Si, Wentao Chen, Wei Wang, Liang Wang, Tieniu Tan
Abstract	Few-shot learning aims to learn to generalize a classifier to novel classes with limited labeled data. Transductive inference that utilizes unlabeled test set to deal with low-data problem has been employed for few-shot learning in recent literature. Yet, these methods do not explicitly exploit the manifold structures of semantic clusters, which is inefficient for transductive inference. In this paper, we propose a novel Progressive Cluster Purification (PCP) method for transductive few-shot learning. The PCP can progressively purify the cluster by exploring the semantic interdependency in the individual cluster space. Specifically, the PCP consists of two-level operations: inter-class classification and intra-class transduction. The inter-class classification partitions all the test samples into several clusters by comparing the test samples with the prototypes. The intra-class transduction effectively explores trustworthy test samples for each cluster by modeling data relations within a cluster as well as among different clusters. Then, it refines the prototypes to better represent the real distribution of semantic clusters. The refined prototypes are used to remeasure all the test instances and purify each cluster. Furthermore, the inter-class classification and the intra-class transduction are extremely flexible to be repeated several times to progressively purify the clusters. Experimental results are provided on two datasets: miniImageNet dataset and tieredImageNet dataset. The comparison results demonstrate the effectiveness of our approach and show that our approach outperforms the state-of-the-art methods on both datasets.
Tasks	Few-Shot Learning
Published	2019-06-10
URL	https://arxiv.org/abs/1906.03847v1
PDF	https://arxiv.org/pdf/1906.03847v1.pdf
PWC	https://paperswithcode.com/paper/progressive-cluster-purification-for
Repo
Framework

On the Generalization Properties of Minimum-norm Solutions for Over-parameterized Neural Network Models


Title	On the Generalization Properties of Minimum-norm Solutions for Over-parameterized Neural Network Models
Authors	Weinan E, Chao Ma, Lei Wu
Abstract	We study the generalization properties of minimum-norm solutions for three over-parametrized machine learning models including the random feature model, the two-layer neural network model and the residual network model. We proved that for all three models, the generalization error for the minimum-norm solution is comparable to the Monte Carlo rate, up to some logarithmic terms, as long as the models are sufficiently over-parametrized.
Tasks
Published	2019-12-15
URL	https://arxiv.org/abs/1912.06987v1
PDF	https://arxiv.org/pdf/1912.06987v1.pdf
PWC	https://paperswithcode.com/paper/on-the-generalization-properties-of-minimum
Repo
Framework

Learning Hybrid Representation by Robust Dictionary Learning in Factorized Compressed Space


Title	Learning Hybrid Representation by Robust Dictionary Learning in Factorized Compressed Space
Authors	Jiahuan Ren, Zhao Zhang, Sheng Li, Yang Wang, Guangcan Liu, Shuicheng Yan, Meng Wang
Abstract	In this paper, we investigate the robust dictionary learning (DL) to discover the hybrid salient low-rank and sparse representation in a factorized compressed space. A Joint Robust Factorization and Projective Dictionary Learning (J-RFDL) model is presented. The setting of J-RFDL aims at improving the data representations by enhancing the robustness to outliers and noise in data, encoding the reconstruction error more accurately and obtaining hybrid salient coefficients with accurate reconstruction ability. Specifically, J-RFDL performs the robust representation by DL in a factorized compressed space to eliminate the negative effects of noise and outliers on the results, which can also make the DL process efficient. To make the encoding process robust to noise in data, J-RFDL clearly uses sparse L2, 1-norm that can potentially minimize the factorization and reconstruction errors jointly by forcing rows of the reconstruction errors to be zeros. To deliver salient coefficients with good structures to reconstruct given data well, J-RFDL imposes the joint low-rank and sparse constraints on the embedded coefficients with a synthesis dictionary. Based on the hybrid salient coefficients, we also extend J-RFDL for the joint classification and propose a discriminative J-RFDL model, which can improve the discriminating abilities of learnt coeffi-cients by minimizing the classification error jointly. Extensive experiments on public datasets demonstrate that our formulations can deliver superior performance over other state-of-the-art methods.
Tasks	Dictionary Learning
Published	2019-12-26
URL	https://arxiv.org/abs/1912.11785v1
PDF	https://arxiv.org/pdf/1912.11785v1.pdf
PWC	https://paperswithcode.com/paper/learning-hybrid-representation-by-robust
Repo
Framework

QubitHD: A Stochastic Acceleration Method for HD Computing-Based Machine Learning


Title	QubitHD: A Stochastic Acceleration Method for HD Computing-Based Machine Learning
Authors	Samuel Bosch, Alexander Sanchez de la Cerda, Mohsen Imani, Tajana Simunic Rosing, Giovanni De Micheli
Abstract	Machine Learning algorithms based on Brain-inspired Hyperdimensional (HD) computing imitate cognition by exploiting statistical properties of high-dimensional vector spaces. It is a promising solution for achieving high energy-efficiency in different machine learning tasks, such as classification, semi-supervised learning and clustering. A weakness of existing HD computing-based ML algorithms is the fact that they have to be binarized for achieving very high energy-efficiency. At the same time, binarized models reach lower classification accuracies. To solve the problem of the trade-off between energy-efficiency and classification accuracy, we propose the QubitHD algorithm. It stochastically binarizes HD-based algorithms, while maintaining comparable classification accuracies to their non-binarized counterparts. The FPGA implementation of QubitHD provides a 65% improvement in terms of energy-efficiency, and a 95% improvement in terms of the training time, as compared to state-of-the-art HD-based ML algorithms. It also outperforms state-of-the-art low-cost classifiers (like Binarized Neural Networks) in terms of speed and energy-efficiency by an order of magnitude during training and inference.
Tasks
Published	2019-11-27
URL	https://arxiv.org/abs/1911.12446v2
PDF	https://arxiv.org/pdf/1911.12446v2.pdf
PWC	https://paperswithcode.com/paper/qubithd-a-stochastic-acceleration-method-for
Repo
Framework

hf0: A hybrid pitch extraction method for multimodal voice


Title	hf0: A hybrid pitch extraction method for multimodal voice
Authors	Pradeep Rengaswamy, Gurunath Reddy M, Krothapalli Sreenivasa Rao
Abstract	Pitch or fundamental frequency (f0) extraction is a fundamental problem studied extensively for its potential applications in speech and clinical applications. In literature, explicit mode specific (modal speech or singing voice or emotional/ expressive speech or noisy speech) signal processing and deep learning f0 extraction methods that exploit the quasi periodic nature of the signal in time, harmonic property in spectral or combined form to extract the pitch is developed. Hence, there is no single unified method which can reliably extract the pitch from various modes of the acoustic signal. In this work, we propose a hybrid f0 extraction method which seamlessly extracts the pitch across modes of speech production with very high accuracy required for many applications. The proposed hybrid model exploits the advantages of deep learning and signal processing methods to minimize the pitch detection error and adopts to various modes of acoustic signal. Specifically, we propose an ordinal regression convolutional neural networks to map the periodicity rich input representation to obtain the nominal pitch classes which drastically reduces the number of classes required for pitch detection unlike other deep learning approaches. Further, the accurate f0 is estimated from the nominal pitch class labels by filtering and autocorrelation. We show that the proposed method generalizes to the unseen modes of voice production and various noises for large scale datasets. Also, the proposed hybrid model significantly reduces the learning parameters required to train the deep model compared to other methods. Furthermore,the evaluation measures showed that the proposed method is significantly better than the state-of-the-art signal processing and deep learning approaches.
Tasks
Published	2019-04-22
URL	http://arxiv.org/abs/1904.09765v1
PDF	http://arxiv.org/pdf/1904.09765v1.pdf
PWC	https://paperswithcode.com/paper/hf0-a-hybrid-pitch-extraction-method-for
Repo
Framework

Persistent Intersection Homology for the Analysis of Discrete Data


Title	Persistent Intersection Homology for the Analysis of Discrete Data
Authors	Bastian Rieck, Markus Banagl, Filip Sadlo, Heike Leitte
Abstract	Topological data analysis is becoming increasingly relevant to support the analysis of unstructured data sets. A common assumption in data analysis is that the data set is a sample—not necessarily a uniform one—of some high-dimensional manifold. In such cases, persistent homology can be successfully employed to extract features, remove noise, and compare data sets. The underlying problems in some application domains, however, turn out to represent multiple manifolds with different dimensions. Algebraic topology typically analyzes such problems using intersection homology, an extension of homology that is capable of handling configurations with singularities. In this paper, we describe how the persistent variant of intersection homology can be used to assist data analysis in visualization. We point out potential pitfalls in approximating data sets with singularities and give strategies for resolving them.
Tasks	Topological Data Analysis
Published	2019-07-31
URL	https://arxiv.org/abs/1907.13485v1
PDF	https://arxiv.org/pdf/1907.13485v1.pdf
PWC	https://paperswithcode.com/paper/persistent-intersection-homology-for-the
Repo
Framework

DeepIST: Deep Image-based Spatio-Temporal Network for Travel Time Estimation


Title	DeepIST: Deep Image-based Spatio-Temporal Network for Travel Time Estimation
Authors	Tao-yang Fu, Wang-Chien Lee
Abstract	Estimating the travel time for a given path is a fundamental problem in many urban transportation systems. However, prior works fail to well capture moving behaviors embedded in paths and thus do not estimate the travel time accurately. To fill in this gap, in this work, we propose a novel neural network framework, namely {\em Deep Image-based Spatio-Temporal network (DeepIST)}, for travel time estimation of a given path. The novelty of DeepIST lies in the following aspects: 1) we propose to plot a path as a sequence of “generalized images” which include sub-paths along with additional information, such as traffic conditions, road network and traffic signals, in order to harness the power of convolutional neural network model (CNN) on image processing; 2) we design a novel two-dimensional CNN, namely {\em PathCNN}, to extract spatial patterns for lines in images by regularization and adopting multiple pooling methods; and 3) we apply a one-dimensional CNN to capture temporal patterns among the spatial patterns along the paths for the estimation. Empirical results show that DeepIST soundly outperforms the state-of-the-art travel time estimation models by 24.37% to 25.64% of mean absolute error (MAE) in multiple large-scale real-world datasets.
Tasks
Published	2019-09-05
URL	https://arxiv.org/abs/1909.05637v1
PDF	https://arxiv.org/pdf/1909.05637v1.pdf
PWC	https://paperswithcode.com/paper/deepist-deep-image-based-spatio-temporal
Repo
Framework

Document Structure Extraction for Forms using Very High Resolution Semantic Segmentation


Title	Document Structure Extraction for Forms using Very High Resolution Semantic Segmentation
Authors	Mausoom Sarkar, Milan Aggarwal, Arneh Jain, Hiresh Gupta, Balaji Krishnamurthy
Abstract	In this work, we look at the problem of structure extraction from document images with a specific focus on forms. Forms as a document class have not received much attention, even though they comprise a significant fraction of documents and enable several applications. Forms possess a rich, complex, hierarchical, and high-density semantic structure that poses several challenges to semantic segmentation methods. We propose a prior based deep CNN-RNN hierarchical network architecture that enables document structure extraction using very high resolution(1800 x 1000) images. We divide the document image into overlapping horizontal strips such that the network segments a strip and uses its prediction mask as prior while predicting the segmentation for the subsequent strip. We perform experiments establishing the effectiveness of our strip based network architecture through ablation methods and comparison with low-resolution variations. We introduce our new rich human-annotated forms dataset, and we show that our method significantly outperforms other segmentation baselines in extracting several hierarchical structures on this dataset. We also outperform other baselines in table detection task on the Marmot dataset. Our method is currently being used in a world-leading customer experience management software suite for automated conversion of paper and PDF forms to modern HTML based forms.
Tasks	Semantic Segmentation, Table Detection
Published	2019-11-27
URL	https://arxiv.org/abs/1911.12170v1
PDF	https://arxiv.org/pdf/1911.12170v1.pdf
PWC	https://paperswithcode.com/paper/document-structure-extraction-for-forms-using
Repo
Framework

Inferring 3D Shapes of Unknown Rigid Objects in Clutter through Inverse Physics Reasoning


Title	Inferring 3D Shapes of Unknown Rigid Objects in Clutter through Inverse Physics Reasoning
Authors	Changkyu Song, Abdeslam Boularias
Abstract	We present a probabilistic approach for building, on the fly, 3-D models of unknown objects while being manipulated by a robot. We specifically consider manipulation tasks in piles of clutter that contain previously unseen objects. Most manipulation algorithms for performing such tasks require known geometric models of the objects in order to grasp or rearrange them robustly. One of the novel aspects of this work is the utilization of a physics engine for verifying hypothesized geometries in simulation. The evidence provided by physics simulations is used in a probabilistic framework that accounts for the fact that mechanical properties of the objects are uncertain. We present an efficient algorithm for inferring occluded parts of objects based on their observed motions and mutual interactions. Experiments using a robot show that this approach is efficient for constructing physically realistic 3-D models, which can be useful for manipulation planning. Experiments also show that the proposed approach significantly outperforms alternative approaches in terms of shape accuracy.
Tasks
Published	2019-03-13
URL	http://arxiv.org/abs/1903.05749v1
PDF	http://arxiv.org/pdf/1903.05749v1.pdf
PWC	https://paperswithcode.com/paper/inferring-3d-shapes-of-unknown-rigid-objects
Repo
Framework

Communal Domain Learning for Registration in Drifted Image Spaces


Title	Communal Domain Learning for Registration in Drifted Image Spaces
Authors	Awais Mansoor, Marius George Linguraru
Abstract	Designing a registration framework for images that do not share the same probability distribution is a major challenge in modern image analytics yet trivial task for the human visual system (HVS). Discrepancies in probability distributions, also known as \emph{drifts}, can occur due to various reasons including, but not limited to differences in sequences and modalities (e.g., MRI T1-T2 and MRI-CT registration), or acquisition settings (e.g., multisite, inter-subject, or intra-subject registrations). The popular assumption about the working of HVS is that it exploits a communal feature subspace exists between the registering images or fields-of-view that encompasses key drift-invariant features. Mimicking the approach that is potentially adopted by the HVS, herein, we present a representation learning technique of this invariant communal subspace that is shared by registering domains. The proposed communal domain learning (CDL) framework uses a set of hierarchical nonlinear transforms to learn the communal subspace that minimizes the probability differences and maximizes the amount of shared information between the registering domains. Similarity metric and parameter optimization calculations for registration are subsequently performed in the drift-minimized learned communal subspace. This generic registration framework is applied to register multisequence (MR: T1, T2) and multimodal (MR, CT) images. Results demonstrated generic applicability, consistent performance, and statistically significant improvement for both multi-sequence and multi-modal data using the proposed approach ($p$-value$<0.001$; Wilcoxon rank sum test) over baseline methods.
Tasks	Representation Learning
Published	2019-08-20
URL	https://arxiv.org/abs/1908.07646v1
PDF	https://arxiv.org/pdf/1908.07646v1.pdf
PWC	https://paperswithcode.com/paper/190807646
Repo
Framework

Recent Advances in Imaging Around Corners


Title	Recent Advances in Imaging Around Corners
Authors	Tomohiro Maeda, Guy Satat, Tristan Swedish, Lagnojita Sinha, Ramesh Raskar
Abstract	Seeing around corners, also known as non-line-of-sight (NLOS) imaging is a computational method to resolve or recover objects hidden around corners. Recent advances in imaging around corners have gained significant interest. This paper reviews different types of existing NLOS imaging techniques and discusses the challenges that need to be addressed, especially for their applications outside of a constrained laboratory environment. Our goal is to introduce this topic to broader research communities as well as provide insights that would lead to further developments in this research area.
Tasks
Published	2019-10-12
URL	https://arxiv.org/abs/1910.05613v1
PDF	https://arxiv.org/pdf/1910.05613v1.pdf
PWC	https://paperswithcode.com/paper/recent-advances-in-imaging-around-corners
Repo
Framework

Lookup Table-Based Consensus Algorithm for Real-Time Longitudinal Motion Control of Connected and Automated Vehicles


Title	Lookup Table-Based Consensus Algorithm for Real-Time Longitudinal Motion Control of Connected and Automated Vehicles
Authors	Ziran Wang, Kyuntae Han, BaekGyu Kim, Guoyuan Wu, Matthew J. Barth
Abstract	Connected and automated vehicle (CAV) technology is one of the promising solutions to addressing the safety, mobility and sustainability issues of our current transportation systems. Specifically, the control algorithm plays an important role in a CAV system, since it executes the commands generated by former steps, such as communication, perception, and planning. In this study, we propose a consensus algorithm to control the longitudinal motion of CAVs in real time. Different from previous studies in this field where control gains of the consensus algorithm are pre-determined and fixed, we develop algorithms to build up a lookup table, searching for the ideal control gains with respect to different initial conditions of CAVs in real time. Numerical simulation shows that, the proposed lookup table-based consensus algorithm outperforms the authors’ previous work, as well as van Arem’s linear feedback-based longitudinal motion control algorithm in all four different scenarios with various initial conditions of CAVs, in terms of convergence time and maximum jerk of the simulation run.
Tasks
Published	2019-02-20
URL	https://arxiv.org/abs/1902.07747v4
PDF	https://arxiv.org/pdf/1902.07747v4.pdf
PWC	https://paperswithcode.com/paper/lookup-table-based-consensus-algorithm-for
Repo
Framework

From Few to More: Large-scale Dynamic Multiagent Curriculum Learning


Title	From Few to More: Large-scale Dynamic Multiagent Curriculum Learning
Authors	Weixun Wang, Tianpei Yang, Yong Liu, Jianye Hao, Xiaotian Hao, Yujing Hu, Yingfeng Chen, Changjie Fan, Yang Gao
Abstract	A lot of efforts have been devoted to investigating how agents can learn effectively and achieve coordination in multiagent systems. However, it is still challenging in large-scale multiagent settings due to the complex dynamics between the environment and agents and the explosion of state-action space. In this paper, we design a novel Dynamic Multiagent Curriculum Learning (DyMA-CL) to solve large-scale problems by starting from learning on a multiagent scenario with a small size and progressively increasing the number of agents. We propose three transfer mechanisms across curricula to accelerate the learning process. Moreover, due to the fact that the state dimension varies across curricula,, and existing network structures cannot be applied in such a transfer setting since their network input sizes are fixed. Therefore, we design a novel network structure called Dynamic Agent-number Network (DyAN) to handle the dynamic size of the network input. Experimental results show that DyMA-CL using DyAN greatly improves the performance of large-scale multiagent learning compared with state-of-the-art deep reinforcement learning approaches. We also investigate the influence of three transfer mechanisms across curricula through extensive simulations.
Tasks
Published	2019-09-06
URL	https://arxiv.org/abs/1909.02790v2
PDF	https://arxiv.org/pdf/1909.02790v2.pdf
PWC	https://paperswithcode.com/paper/from-few-to-more-large-scale-dynamic
Repo
Framework