October 16, 2019

3222 words 16 mins read

Paper Group ANR 1067

Paper Group ANR 1067

Deep Learning Based Speech Beamforming. Dynamic Representations Toward Efficient Inference on Deep Neural Networks by Decision Gates. Adaptive specular reflection detection and inpainting in colonoscopy video frames. Deep Siamese Networks with Bayesian non-Parametrics for Video Object Tracking. 3DCapsule: Extending the Capsule Architecture to Class …

Deep Learning Based Speech Beamforming

Title Deep Learning Based Speech Beamforming
Authors Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Dinei Florencio, Mark Hasegawa-Johnson
Abstract Multi-channel speech enhancement with ad-hoc sensors has been a challenging task. Speech model guided beamforming algorithms are able to recover natural sounding speech, but the speech models tend to be oversimplified or the inference would otherwise be too complicated. On the other hand, deep learning based enhancement approaches are able to learn complicated speech distributions and perform efficient inference, but they are unable to deal with variable number of input channels. Also, deep learning approaches introduce a lot of errors, particularly in the presence of unseen noise types and settings. We have therefore proposed an enhancement framework called DEEPBEAM, which combines the two complementary classes of algorithms. DEEPBEAM introduces a beamforming filter to produce natural sounding speech, but the filter coefficients are determined with the help of a monaural speech enhancement neural network. Experiments on synthetic and real-world data show that DEEPBEAM is able to produce clean, dry and natural sounding speech, and is robust against unseen noise.
Tasks Speech Enhancement
Published 2018-02-15
URL http://arxiv.org/abs/1802.05383v1
PDF http://arxiv.org/pdf/1802.05383v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-based-speech-beamforming
Repo
Framework

Dynamic Representations Toward Efficient Inference on Deep Neural Networks by Decision Gates

Title Dynamic Representations Toward Efficient Inference on Deep Neural Networks by Decision Gates
Authors Mohammad Saeed Shafiee, Mohammad Javad Shafiee, Alexander Wong
Abstract While deep neural networks extract rich features from the input data, the current trade-off between depth and computational cost makes it difficult to adopt deep neural networks for many industrial applications, especially when computing power is limited. Here, we are inspired by the idea that, while deeper embeddings are needed to discriminate difficult samples (i.e., fine-grained discrimination), a large number of samples can be well discriminated via much shallower embeddings (i.e., coarse-grained discrimination). In this study, we introduce the simple yet effective concept of decision gates (d-gate), modules trained to decide whether a sample needs to be projected into a deeper embedding or if an early prediction can be made at the d-gate, thus enabling the computation of dynamic representations at different depths. The proposed d-gate modules can be integrated with any deep neural network and reduces the average computational cost of the deep neural networks while maintaining modeling accuracy. The proposed d-gate framework is examined via different network architectures and datasets, with experimental results showing that leveraging the proposed d-gate modules led to a ~43% speed-up and 44% FLOPs reduction on ResNet-101 and 55% speed-up and 39% FLOPs reduction on DenseNet-201 trained on the CIFAR10 dataset with only ~2% drop in accuracy. Furthermore, experiments where d-gate modules are integrated into ResNet-101 trained on the ImageNet dataset demonstrate that it is possible to reduce the computational cost of the network by 1.5 GFLOPs without any drop in the modeling accuracy.
Tasks
Published 2018-11-05
URL https://arxiv.org/abs/1811.01476v4
PDF https://arxiv.org/pdf/1811.01476v4.pdf
PWC https://paperswithcode.com/paper/efficient-inference-on-deep-neural-networks
Repo
Framework

Adaptive specular reflection detection and inpainting in colonoscopy video frames

Title Adaptive specular reflection detection and inpainting in colonoscopy video frames
Authors Mojtaba Akbari, Majid Mohrekesh, S. M. Reza Soroushmehr, Nader Karimi, Shadrokh Samavi, Kayvan Najarian
Abstract Colonoscopy video frames might be contaminated by bright spots with unsaturated values known as specular reflection. Detection and removal of such reflections could enhance the quality of colonoscopy images and facilitate diagnosis procedure. In this paper we propose a novel two-phase method for this purpose, consisting of detection and removal phases. In the detection phase, we employ both HSV and RGB color space information for segmentation of specular reflections. We first train a non-linear SVM for selecting a color space based on image statistical features extracted from each channel of the color spaces. Then, a cost function for detection of specular reflections is introduced. In the removal phase, we propose a two-step inpainting method which consists of appropriate replacement patch selection and removal of the blockiness effects. The proposed method is evaluated by testing on an available colonoscopy image database where accuracy and Dice score of 99.68% and 71.79% are achieved respectively.
Tasks
Published 2018-02-23
URL http://arxiv.org/abs/1802.08402v1
PDF http://arxiv.org/pdf/1802.08402v1.pdf
PWC https://paperswithcode.com/paper/adaptive-specular-reflection-detection-and
Repo
Framework

Deep Siamese Networks with Bayesian non-Parametrics for Video Object Tracking

Title Deep Siamese Networks with Bayesian non-Parametrics for Video Object Tracking
Authors Anthony D. Rhodes, Manan Goel
Abstract We present a novel algorithm utilizing a deep Siamese neural network as a general object similarity function in combination with a Bayesian optimization (BO) framework to encode spatio-temporal information for efficient object tracking in video. In particular, we treat the video tracking problem as a dynamic (i.e. temporally-evolving) optimization problem. Using Gaussian Process priors, we model a dynamic objective function representing the location of a tracked object in each frame. By exploiting temporal correlations, the proposed method queries the search space in a statistically principled and efficient way, offering several benefits over current state of the art video tracking methods.
Tasks Object Tracking, Video Object Tracking
Published 2018-11-18
URL http://arxiv.org/abs/1811.07386v1
PDF http://arxiv.org/pdf/1811.07386v1.pdf
PWC https://paperswithcode.com/paper/deep-siamese-networks-with-bayesian-non
Repo
Framework

3DCapsule: Extending the Capsule Architecture to Classify 3D Point Clouds

Title 3DCapsule: Extending the Capsule Architecture to Classify 3D Point Clouds
Authors Ali Cheraghian, Lars Petersson
Abstract This paper introduces the 3DCapsule, which is a 3D extension of the recently introduced Capsule concept that makes it applicable to unordered point sets. The original Capsule relies on the existence of a spatial relationship between the elements in the feature map it is presented with, whereas in point permutation invariant formulations of 3D point set classification methods, such relationships are typically lost. Here, a new layer called ComposeCaps is introduced that, in lieu of a spatially relevant feature mapping, learns a new mapping that can be exploited by the 3DCapsule. Previous works in the 3D point set classification domain have focused on other parts of the architecture, whereas instead, the 3DCapsule is a drop-in replacement of the commonly used fully connected classifier. It is demonstrated via an ablation study, that when the 3DCapsule is applied to recent 3D point set classification architectures, it consistently shows an improvement, in particular when subjected to noisy data. Similarly, the ComposeCaps layer is evaluated and demonstrates an improvement over the baseline. In an apples-to-apples comparison against state-of-the-art methods, again, better performance is demonstrated by the 3DCapsule.
Tasks Classify 3D Point Clouds
Published 2018-11-06
URL http://arxiv.org/abs/1811.02191v1
PDF http://arxiv.org/pdf/1811.02191v1.pdf
PWC https://paperswithcode.com/paper/3dcapsule-extending-the-capsule-architecture
Repo
Framework

Bi-directional Graph Structure Information Model for Multi-Person Pose Estimation

Title Bi-directional Graph Structure Information Model for Multi-Person Pose Estimation
Authors Jing Wang, Ze Peng, Pei Lv, Junyi Sun, Bing Zhou, Mingliang Xu
Abstract In this paper, we propose a novel multi-stage network architecture with two branches in each stage to estimate multi-person poses in images. The first branch predicts the confidence maps of joints and uses a geometrical transform kernel to propagate information between neighboring joints at the confidence level. The second branch proposes a bi-directional graph structure information model (BGSIM) to encode rich contextual information and to infer the occlusion relationship among different joints. We dynamically determine the joint point with highest response of the confidence maps as base point of passing message in BGSIM. Based on the proposed network structure, we achieve an average precision of 62.9 on the COCO Keypoint Challenge dataset and 77.6 on the MPII (multi-person) dataset. Compared with other state-of-art methods, our method can achieve highly promising results on our selected multi-person dataset without extra training.
Tasks Multi-Person Pose Estimation, Pose Estimation
Published 2018-05-02
URL http://arxiv.org/abs/1805.00603v2
PDF http://arxiv.org/pdf/1805.00603v2.pdf
PWC https://paperswithcode.com/paper/bi-directional-graph-structure-information
Repo
Framework

Composite Functional Gradient Learning of Generative Adversarial Models

Title Composite Functional Gradient Learning of Generative Adversarial Models
Authors Rie Johnson, Tong Zhang
Abstract This paper first presents a theory for generative adversarial methods that does not rely on the traditional minimax formulation. It shows that with a strong discriminator, a good generator can be learned so that the KL divergence between the distributions of real data and generated data improves after each functional gradient step until it converges to zero. Based on the theory, we propose a new stable generative adversarial method. A theoretical insight into the original GAN from this new viewpoint is also provided. The experiments on image generation show the effectiveness of our new method.
Tasks Image Generation
Published 2018-01-19
URL http://arxiv.org/abs/1801.06309v2
PDF http://arxiv.org/pdf/1801.06309v2.pdf
PWC https://paperswithcode.com/paper/composite-functional-gradient-learning-of
Repo
Framework
Title DeepLink: A Novel Link Prediction Framework based on Deep Learning
Authors Mohammad Mehdi Keikha, Maseud Rahgozar, Masoud Asadpour
Abstract Recently, link prediction has attracted more attentions from various disciplines such as computer science, bioinformatics and economics. In this problem, unknown links between nodes are discovered based on numerous information such as network topology, profile information and user generated contents. Most of the previous researchers have focused on the structural features of the networks. While the recent researches indicate that contextual information can change the network topology. Although, there are number of valuable researches which combine structural and content information, but they face with the scalability issue due to feature engineering. Because, majority of the extracted features are obtained by a supervised or semi supervised algorithm. Moreover, the existing features are not general enough to indicate good performance on different networks with heterogeneous structures. Besides, most of the previous researches are presented for undirected and unweighted networks. In this paper, a novel link prediction framework called “DeepLink” is presented based on deep learning techniques. In contrast to the previous researches which fail to automatically extract best features for the link prediction, deep learning reduces the manual feature engineering. In this framework, both the structural and content information of the nodes are employed. The framework can use different structural feature vectors, which are prepared by various link prediction methods. It considers all proximity orders that are presented in a network during the structural feature learning. We have evaluated the performance of DeepLink on two real social network datasets including Telegram and irBlogs. On both datasets, the proposed framework outperforms several structural and hybrid approaches for link prediction problem.
Tasks Feature Engineering, Link Prediction
Published 2018-07-27
URL http://arxiv.org/abs/1807.10494v1
PDF http://arxiv.org/pdf/1807.10494v1.pdf
PWC https://paperswithcode.com/paper/deeplink-a-novel-link-prediction-framework
Repo
Framework

Gradient Coding via the Stochastic Block Model

Title Gradient Coding via the Stochastic Block Model
Authors Zachary Charles, Dimitris Papailiopoulos
Abstract Gradient descent and its many variants, including mini-batch stochastic gradient descent, form the algorithmic foundation of modern large-scale machine learning. Due to the size and scale of modern data, gradient computations are often distributed across multiple compute nodes. Unfortunately, such distributed implementations can face significant delays caused by straggler nodes, i.e., nodes that are much slower than average. Gradient coding is a new technique for mitigating the effect of stragglers via algorithmic redundancy. While effective, previously proposed gradient codes can be computationally expensive to construct, inaccurate, or susceptible to adversarial stragglers. In this work, we present the stochastic block code (SBC), a gradient code based on the stochastic block model. We show that SBCs are efficient, accurate, and that under certain settings, adversarial straggler selection becomes as hard as detecting a community structure in the multiple community, block stochastic graph model.
Tasks
Published 2018-05-25
URL http://arxiv.org/abs/1805.10378v1
PDF http://arxiv.org/pdf/1805.10378v1.pdf
PWC https://paperswithcode.com/paper/gradient-coding-via-the-stochastic-block
Repo
Framework

Explicit Contextual Semantics for Text Comprehension

Title Explicit Contextual Semantics for Text Comprehension
Authors Zhuosheng Zhang, Yuwei Wu, Zuchao Li, Hai Zhao
Abstract Who did what to whom is a major focus in natural language understanding, which is right the aim of semantic role labeling (SRL) task. Despite of sharing a lot of processing characteristics and even task purpose, it is surprisingly that jointly considering these two related tasks was never formally reported in previous work. Thus this paper makes the first attempt to let SRL enhance text comprehension and inference through specifying verbal predicates and their corresponding semantic roles. In terms of deep learning models, our embeddings are enhanced by explicit contextual semantic role labels for more fine-grained semantics. We show that the salient labels can be conveniently added to existing models and significantly improve deep learning models in challenging text comprehension tasks. Extensive experiments on benchmark machine reading comprehension and inference datasets verify that the proposed semantic learning helps our system reach new state-of-the-art over strong baselines which have been enhanced by well pretrained language models from the latest progress.
Tasks Machine Reading Comprehension, Reading Comprehension, Semantic Role Labeling
Published 2018-09-08
URL https://arxiv.org/abs/1809.02794v3
PDF https://arxiv.org/pdf/1809.02794v3.pdf
PWC https://paperswithcode.com/paper/i-know-what-you-want-semantic-learning-for
Repo
Framework

Multi-Spectral Imaging via Computed Tomography (MUSIC) - Comparing Unsupervised Spectral Segmentations for Material Differentiation

Title Multi-Spectral Imaging via Computed Tomography (MUSIC) - Comparing Unsupervised Spectral Segmentations for Material Differentiation
Authors Christian Kehl, Wail Mustafa, Jan Kehres, Anders Bjorholm Dahl, Ulrik Lund Olsen
Abstract Multi-spectral computed tomography is an emerging technology for the non-destructive identification of object materials and the study of their physical properties. Applications of this technology can be found in various scientific and industrial contexts, such as luggage scanning at airports. Material distinction and its identification is challenging, even with spectral x-ray information, due to acquisition noise, tomographic reconstruction artefacts and scanning setup application constraints. We present MUSIC - and open access multi-spectral CT dataset in 2D and 3D - to promote further research in the area of material identification. We demonstrate the value of this dataset on the image analysis challenge of object segmentation purely based on the spectral response of its composing materials. In this context, we compare the segmentation accuracy of fast adaptive mean shift (FAMS) and unconstrained graph cuts on both datasets. We further discuss the impact of reconstruction artefacts and segmentation controls on the achievable results. Dataset, related software packages and further documentation are made available to the imaging community in an open-access manner to promote further data-driven research on the subject
Tasks Semantic Segmentation
Published 2018-10-28
URL http://arxiv.org/abs/1810.11823v1
PDF http://arxiv.org/pdf/1810.11823v1.pdf
PWC https://paperswithcode.com/paper/multi-spectral-imaging-via-computed
Repo
Framework

Multiple-Instance Learning by Boosting Infinitely Many Shapelet-based Classifiers

Title Multiple-Instance Learning by Boosting Infinitely Many Shapelet-based Classifiers
Authors Daiki Suehiro, Kohei Hatano, Eiji Takimoto, Shuji Yamamoto, Kenichi Bannai, Akiko Takeda
Abstract We propose a new formulation of Multiple-Instance Learning (MIL). In typical MIL settings, a unit of data is given as a set of instances called a bag and the goal is to find a good classifier of bags based on similarity from a single or finitely many “shapelets” (or patterns), where the similarity of the bag from a shapelet is the maximum similarity of instances in the bag. Classifiers based on a single shapelet are not sufficiently strong for certain applications. Additionally, previous work with multiple shapelets has heuristically chosen some of the instances as shapelets with no theoretical guarantee of its generalization ability. Our formulation provides a richer class of the final classifiers based on infinitely many shapelets. We provide an efficient algorithm for the new formulation, in addition to generalization bound. Our empirical study demonstrates that our approach is effective not only for MIL tasks but also for Shapelet Learning for time-series classification.
Tasks Multiple Instance Learning, Time Series, Time Series Classification
Published 2018-11-20
URL http://arxiv.org/abs/1811.08084v2
PDF http://arxiv.org/pdf/1811.08084v2.pdf
PWC https://paperswithcode.com/paper/multiple-instance-learning-by-boosting
Repo
Framework

Image Co-segmentation via Multi-scale Local Shape Transfer

Title Image Co-segmentation via Multi-scale Local Shape Transfer
Authors Wei Teng, Yu Zhang, Xiaowu Chen, Jia Li, Zhiqiang He
Abstract Image co-segmentation is a challenging task in computer vision that aims to segment all pixels of the objects from a predefined semantic category. In real-world cases, however, common foreground objects often vary greatly in appearance, making their global shapes highly inconsistent across images and difficult to be segmented. To address this problem, this paper proposes a novel co-segmentation approach that transfers patch-level local object shapes which appear more consistent across different images. In our framework, a multi-scale patch neighbourhood system is first generated using proposal flow on arbitrary image-pair, which is further refined by Locally Linear Embedding. Based on the patch relationships, we propose an efficient algorithm to jointly segment the objects in each image while transferring their local shapes across different images. Extensive experiments demonstrate that the proposed method can robustly and effectively segment common objects from an image set. On iCoseg, MSRC and Coseg-Rep dataset, the proposed approach performs comparable or better than the state-of-thearts, while on a more challenging benchmark Fashionista dataset, our method achieves significant improvements.
Tasks
Published 2018-05-15
URL http://arxiv.org/abs/1805.05610v1
PDF http://arxiv.org/pdf/1805.05610v1.pdf
PWC https://paperswithcode.com/paper/image-co-segmentation-via-multi-scale-local
Repo
Framework

Weakly supervised segment annotation via expectation kernel density estimation

Title Weakly supervised segment annotation via expectation kernel density estimation
Authors Liantao Wang, Qingwu Li, Jianfeng Lu
Abstract Since the labelling for the positive images/videos is ambiguous in weakly supervised segment annotation, negative mining based methods that only use the intra-class information emerge. In these methods, negative instances are utilized to penalize unknown instances to rank their likelihood of being an object, which can be considered as a voting in terms of similarity. However, these methods 1) ignore the information contained in positive bags, 2) only rank the likelihood but cannot generate an explicit decision function. In this paper, we propose a voting scheme involving not only the definite negative instances but also the ambiguous positive instances to make use of the extra useful information in the weakly labelled positive bags. In the scheme, each instance votes for its label with a magnitude arising from the similarity, and the ambiguous positive instances are assigned soft labels that are iteratively updated during the voting. It overcomes the limitations of voting using only the negative bags. We also propose an expectation kernel density estimation (eKDE) algorithm to gain further insight into the voting mechanism. Experimental results demonstrate the superiority of our scheme beyond the baselines.
Tasks Density Estimation
Published 2018-12-15
URL http://arxiv.org/abs/1812.06228v1
PDF http://arxiv.org/pdf/1812.06228v1.pdf
PWC https://paperswithcode.com/paper/weakly-supervised-segment-annotation-via
Repo
Framework

On Geometric Alignment in Low Doubling Dimension

Title On Geometric Alignment in Low Doubling Dimension
Authors Hu Ding, Mingquan Ye
Abstract In real-world, many problems can be formulated as the alignment between two geometric patterns. Previously, a great amount of research focus on the alignment of 2D or 3D patterns, especially in the field of computer vision. Recently, the alignment of geometric patterns in high dimension finds several novel applications, and has attracted more and more attentions. However, the research is still rather limited in terms of algorithms. To the best of our knowledge, most existing approaches for high dimensional alignment are just simple extensions of their counterparts for 2D and 3D cases, and often suffer from the issues such as high complexities. In this paper, we propose an effective framework to compress the high dimensional geometric patterns and approximately preserve the alignment quality. As a consequence, existing alignment approach can be applied to the compressed geometric patterns and thus the time complexity is significantly reduced. Our idea is inspired by the observation that high dimensional data often has a low intrinsic dimension. We adopt the widely used notion “doubling dimension” to measure the extents of our compression and the resulting approximation. Finally, we test our method on both random and real datasets, the experimental results reveal that running the alignment algorithm on compressed patterns can achieve similar qualities, comparing with the results on the original patterns, but the running times (including the times cost for compression) are substantially lower.
Tasks
Published 2018-11-19
URL http://arxiv.org/abs/1811.07455v1
PDF http://arxiv.org/pdf/1811.07455v1.pdf
PWC https://paperswithcode.com/paper/on-geometric-alignment-in-low-doubling
Repo
Framework
comments powered by Disqus