Paper Group ANR 349
LE-HGR: A Lightweight and Efficient RGB-based Online Gesture Recognition Network for Embedded AR Devices. Res3ATN – Deep 3D Residual Attention Network for Hand Gesture Recognition in Videos. Image Speckle Noise Denoising by a Multi-Layer Fusion Enhancement Method based on Block Matching and 3D Filtering. Predicting Memory Compiler Performance Outp …
LE-HGR: A Lightweight and Efficient RGB-based Online Gesture Recognition Network for Embedded AR Devices
Title | LE-HGR: A Lightweight and Efficient RGB-based Online Gesture Recognition Network for Embedded AR Devices |
Authors | Hongwei Xie, Jiafang Wang, Baitao Shao, Jian Gu, Mingyang Li |
Abstract | Online hand gesture recognition (HGR) techniques are essential in augmented reality (AR) applications for enabling natural human-to-computer interaction and communication. In recent years, the consumer market for low-cost AR devices has been rapidly growing, while the technology maturity in this domain is still limited. Those devices are typical of low prices, limited memory, and resource-constrained computational units, which makes online HGR a challenging problem. To tackle this problem, we propose a lightweight and computationally efficient HGR framework, namely LE-HGR, to enable real-time gesture recognition on embedded devices with low computing power. We also show that the proposed method is of high accuracy and robustness, which is able to reach high-end performance in a variety of complicated interaction environments. To achieve our goal, we first propose a cascaded multi-task convolutional neural network (CNN) to simultaneously predict probabilities of hand detection and regress hand keypoint locations online. We show that, with the proposed cascaded architecture design, false-positive estimates can be largely eliminated. Additionally, an associated mapping approach is introduced to track the hand trace via the predicted locations, which addresses the interference of multi-handedness. Subsequently, we propose a trace sequence neural network (TraceSeqNN) to recognize the hand gesture by exploiting the motion features of the tracked trace. Finally, we provide a variety of experimental results to show that the proposed framework is able to achieve state-of-the-art accuracy with significantly reduced computational cost, which are the key properties for enabling real-time applications in low-cost commercial devices such as mobile devices and AR/VR headsets. |
Tasks | 3D Part Segmentation, Gesture Recognition, Hand Gesture Recognition, Hand-Gesture Recognition |
Published | 2020-01-16 |
URL | https://arxiv.org/abs/2001.05654v1 |
https://arxiv.org/pdf/2001.05654v1.pdf | |
PWC | https://paperswithcode.com/paper/le-hgr-a-lightweight-and-efficient-rgb-based |
Repo | |
Framework | |
Res3ATN – Deep 3D Residual Attention Network for Hand Gesture Recognition in Videos
Title | Res3ATN – Deep 3D Residual Attention Network for Hand Gesture Recognition in Videos |
Authors | Naina Dhingra, Andreas Kunz |
Abstract | Hand gesture recognition is a strenuous task to solve in videos. In this paper, we use a 3D residual attention network which is trained end to end for hand gesture recognition. Based on the stacked multiple attention blocks, we build a 3D network which generates different features at each attention block. Our 3D attention based residual network (Res3ATN) can be built and extended to very deep layers. Using this network, an extensive analysis is performed on other 3D networks based on three publicly available datasets. The Res3ATN network performance is compared to C3D, ResNet-10, and ResNext-101 networks. We also study and evaluate our baseline network with different number of attention blocks. The comparison shows that the 3D residual attention network with 3 attention blocks is robust in attention learning and is able to classify the gestures with better accuracy, thus outperforming existing networks. |
Tasks | Gesture Recognition, Hand Gesture Recognition, Hand-Gesture Recognition |
Published | 2020-01-04 |
URL | https://arxiv.org/abs/2001.01083v1 |
https://arxiv.org/pdf/2001.01083v1.pdf | |
PWC | https://paperswithcode.com/paper/res3atn-deep-3d-residual-attention-network |
Repo | |
Framework | |
Image Speckle Noise Denoising by a Multi-Layer Fusion Enhancement Method based on Block Matching and 3D Filtering
Title | Image Speckle Noise Denoising by a Multi-Layer Fusion Enhancement Method based on Block Matching and 3D Filtering |
Authors | Huang Shuo, Zhou Ping, Shi Hao, Sun Yu, Wan Suiren |
Abstract | In order to improve speckle noise denoising of block matching 3d filtering (BM3D) method, an image frequency-domain multi-layer fusion enhancement method (MLFE-BM3D) based on nonsubsampled contourlet transform (NSCT) has been proposed. The method designs a NSCT hard threshold denoising enhancement to preprocess the image, then uses fusion enhancement in NSCT domain to fuse the preliminary estimation results of images before and after the NSCT hard threshold denoising, finally, BM3D denoising is carried out with the fused image to obtain the final denoising result. Experiments on natural images and medical ultrasound images show that MLFE-BM3D method can achieve better visual effects than BM3D method, the peak signal to noise ratio (PSNR) of the denoised image is increased by 0.5dB. The MLFE-BM3D method can improve the denoising effect of speckle noise in the texture region, and still maintain a good denoising effect in the smooth region of the image. |
Tasks | Denoising |
Published | 2020-01-04 |
URL | https://arxiv.org/abs/2001.01055v1 |
https://arxiv.org/pdf/2001.01055v1.pdf | |
PWC | https://paperswithcode.com/paper/image-speckle-noise-denoising-by-a-multi |
Repo | |
Framework | |
Predicting Memory Compiler Performance Outputs using Feed-Forward Neural Networks
Title | Predicting Memory Compiler Performance Outputs using Feed-Forward Neural Networks |
Authors | Felix Last, Max Haeberlein, Ulf Schlichtmann |
Abstract | Typical semiconductor chips include thousands of mostly small memories. As memories contribute an estimated 25% to 40% to the overall power, performance, and area (PPA) of a chip, memories must be designed carefully to meet the system’s requirements. Memory arrays are highly uniform and can be described by approximately 10 parameters depending mostly on the complexity of the periphery. Thus, to improve PPA utilization, memories are typically generated by memory compilers. A key task in the design flow of a chip is to find optimal memory compiler parametrizations which on the one hand fulfill system requirements while on the other hand optimize PPA. Although most compiler vendors also provide optimizers for this task, these are often slow or inaccurate. To enable efficient optimization in spite of long compiler run times, we propose training fully connected feed-forward neural networks to predict PPA outputs given a memory compiler parametrization. Using an exhaustive search-based optimizer framework which obtains neural network predictions, PPA-optimal parametrizations are found within seconds after chip designers have specified their requirements. Average model prediction errors of less than 3%, a decision reliability of over 99% and productive usage of the optimizer for successful, large volume chip design projects illustrate the effectiveness of the approach. |
Tasks | |
Published | 2020-03-05 |
URL | https://arxiv.org/abs/2003.03269v1 |
https://arxiv.org/pdf/2003.03269v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-memory-compiler-performance |
Repo | |
Framework | |
FGN: Fully Guided Network for Few-Shot Instance Segmentation
Title | FGN: Fully Guided Network for Few-Shot Instance Segmentation |
Authors | Zhibo Fan, Jin-Gang Yu, Zhihao Liang, Jiarong Ou, Changxin Gao, Gui-Song Xia, Yuanqing Li |
Abstract | Few-shot instance segmentation (FSIS) conjoins the few-shot learning paradigm with general instance segmentation, which provides a possible way of tackling instance segmentation in the lack of abundant labeled data for training. This paper presents a Fully Guided Network (FGN) for few-shot instance segmentation. FGN perceives FSIS as a guided model where a so-called support set is encoded and utilized to guide the predictions of a base instance segmentation network (i.e., Mask R-CNN), critical to which is the guidance mechanism. In this view, FGN introduces different guidance mechanisms into the various key components in Mask R-CNN, including Attention-Guided RPN, Relation-Guided Detector, and Attention-Guided FCN, in order to make full use of the guidance effect from the support set and adapt better to the inter-class generalization. Experiments on public datasets demonstrate that our proposed FGN can outperform the state-of-the-art methods. |
Tasks | Few-Shot Learning, Instance Segmentation, Semantic Segmentation |
Published | 2020-03-31 |
URL | https://arxiv.org/abs/2003.13954v1 |
https://arxiv.org/pdf/2003.13954v1.pdf | |
PWC | https://paperswithcode.com/paper/fgn-fully-guided-network-for-few-shot |
Repo | |
Framework | |
Spatial Attention Pyramid Network for Unsupervised Domain Adaptation
Title | Spatial Attention Pyramid Network for Unsupervised Domain Adaptation |
Authors | Congcong Li, Dawei Du, Libo Zhang, Longyin Wen, Tiejian Luo, Yanjun Wu, Pengfei Zhu |
Abstract | Unsupervised domain adaptation is critical in various computer vision tasks, such as object detection, instance segmentation, and semantic segmentation, which aims to alleviate performance degradation caused by domain-shift. Most of previous methods rely on a single-mode distribution of source and target domains to align them with adversarial learning, leading to inferior results in various scenarios. To that end, in this paper, we design a new spatial attention pyramid network for unsupervised domain adaptation. Specifically, we first build the spatial pyramid representation to capture context information of objects at different scales. Guided by the task-specific information, we combine the dense global structure representation and local texture patterns at each spatial location effectively using the spatial attention mechanism. In this way, the network is enforced to focus on the discriminative regions with context information for domain adaption. We conduct extensive experiments on various challenging datasets for unsupervised domain adaptation on object detection, instance segmentation, and semantic segmentation, which demonstrates that our method performs favorably against the state-of-the-art methods by a large margin. Our source code is available at code_path. |
Tasks | Domain Adaptation, Instance Segmentation, Object Detection, Semantic Segmentation, Unsupervised Domain Adaptation |
Published | 2020-03-29 |
URL | https://arxiv.org/abs/2003.12979v1 |
https://arxiv.org/pdf/2003.12979v1.pdf | |
PWC | https://paperswithcode.com/paper/spatial-attention-pyramid-network-for |
Repo | |
Framework | |
Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems
Title | Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems |
Authors | Luo Luo, Haishan Ye, Tong Zhang |
Abstract | We consider nonconvex-concave minimax problems of the form $\min_{\bf x}\max_{\bf y} f({\bf x},{\bf y})$, where $f$ is strongly-concave in $\bf y$ but possibly nonconvex in $\bf x$. We focus on the stochastic setting, where we can only access an unbiased stochastic gradient estimate of $f$ at each iteration. This formulation includes many machine learning applications as special cases such as adversary training and certifying robustness in deep learning. We are interested in finding an ${\mathcal O}(\varepsilon)$-stationary point of the function $\Phi(\cdot)=\max_{\bf y} f(\cdot, {\bf y})$. The most popular algorithm to solve this problem is stochastic gradient decent ascent, which requires $\mathcal O(\kappa^3\varepsilon^{-4})$ stochastic gradient evaluations, where $\kappa$ is the condition number. In this paper, we propose a novel method called Stochastic Recursive gradiEnt Descent Ascent (SREDA), which estimates gradients more efficiently using variance reduction. This method achieves the best known stochastic gradient complexity of ${\mathcal O}(\kappa^3\varepsilon^{-3})$, and its dependency on $\varepsilon$ is optimal for this problem. |
Tasks | |
Published | 2020-01-11 |
URL | https://arxiv.org/abs/2001.03724v1 |
https://arxiv.org/pdf/2001.03724v1.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-recursive-gradient-descent-ascent |
Repo | |
Framework | |
Multi-Plateau Ensemble for Endoscopic Artefact Segmentation and Detection
Title | Multi-Plateau Ensemble for Endoscopic Artefact Segmentation and Detection |
Authors | Suyog Jadhav, Udbhav Bamba, Arnav Chavan, Rishabh Tiwari, Aryan Raj |
Abstract | Endoscopic artefact detection challenge consists of 1) Artefact detection, 2) Semantic segmentation, and 3) Out-of-sample generalisation. For Semantic segmentation task, we propose a multi-plateau ensemble of FPN (Feature Pyramid Network) with EfficientNet as feature extractor/encoder. For Object detection task, we used a three model ensemble of RetinaNet with Resnet50 Backbone and FasterRCNN (FPN + DC5) with Resnext101 Backbone}. A PyTorch implementation to our approach to the problem is available at https://github.com/ubamba98/EAD2020. |
Tasks | Object Detection, Semantic Segmentation |
Published | 2020-03-23 |
URL | https://arxiv.org/abs/2003.10129v1 |
https://arxiv.org/pdf/2003.10129v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-plateau-ensemble-for-endoscopic |
Repo | |
Framework | |
Re-purposing Heterogeneous Generative Ensembles with Evolutionary Computation
Title | Re-purposing Heterogeneous Generative Ensembles with Evolutionary Computation |
Authors | Jamal Toutouh, Erik Hemberg, Una-May O’Reily |
Abstract | Generative Adversarial Networks (GANs) are popular tools for generative modeling. The dynamics of their adversarial learning give rise to convergence pathologies during training such as mode and discriminator collapse. In machine learning, ensembles of predictors demonstrate better results than a single predictor for many tasks. In this study, we apply two evolutionary algorithms (EAs) to create ensembles to re-purpose generative models, i.e., given a set of heterogeneous generators that were optimized for one objective (e.g., minimize Frechet Inception Distance), create ensembles of them for optimizing a different objective (e.g., maximize the diversity of the generated samples). The first method is restricted by the exact size of the ensemble and the second method only restricts the upper bound of the ensemble size. Experimental analysis on the MNIST image benchmark demonstrates that both EA ensembles creation methods can re-purpose the models, without reducing their original functionality. The EA-based demonstrate significantly better performance compared to other heuristic-based methods. When comparing both evolutionary, the one with only an upper size bound on the ensemble size is the best. |
Tasks | |
Published | 2020-03-30 |
URL | https://arxiv.org/abs/2003.13532v1 |
https://arxiv.org/pdf/2003.13532v1.pdf | |
PWC | https://paperswithcode.com/paper/re-purposing-heterogeneous-generative |
Repo | |
Framework | |
MONSTOR: An Inductive Approach for Estimating and Maximizing Influence over Unseen Social Networks
Title | MONSTOR: An Inductive Approach for Estimating and Maximizing Influence over Unseen Social Networks |
Authors | Jihoon Ko, Kyuhan Lee, Kijung Shin, Noseong Park |
Abstract | Influence maximization (IM) is one of the most important problems in social network analysis. Its objective is to find a given number of seed nodes who maximize the spread of information through a social network. Since it is an NP-hard problem, many approximate/heuristic methods have been developed, and a number of them repeats Monte Carlo (MC) simulations over and over, specifically tens of thousands of times or more per potential seed set, to reliably estimate the influence. In this work, we present an inductive machine learning method, called Monte Carlo Simulator (MONSTOR), to predict the results of MC simulations on networks unseen during training. MONSTOR can greatly accelerate existing IM methods by replacing repeated MC simulations. In our experiments, MONSTOR achieves near-perfect accuracy on unseen real social networks with little sacrifice of accuracy in IM use cases. |
Tasks | |
Published | 2020-01-24 |
URL | https://arxiv.org/abs/2001.08853v1 |
https://arxiv.org/pdf/2001.08853v1.pdf | |
PWC | https://paperswithcode.com/paper/monstor-an-inductive-approach-for-estimating |
Repo | |
Framework | |
European Language Grid: An Overview
Title | European Language Grid: An Overview |
Authors | Georg Rehm, Maria Berger, Ela Elsholz, Stefanie Hegele, Florian Kintzel, Katrin Marheinecke, Stelios Piperidis, Miltos Deligiannis, Dimitris Galanis, Katerina Gkirtzou, Penny Labropoulou, Kalina Bontcheva, David Jones, Ian Roberts, Jan Hajic, Jana Hamrlová, Lukáš Kačena, Khalid Choukri, Victoria Arranz, Andrejs Vasiļjevs, Orians Anvari, Andis Lagzdiņš, Jūlija Meļņika, Gerhard Backfried, Erinç Dikici, Miroslav Janosik, Katja Prinz, Christoph Prinz, Severin Stampler, Dorothea Thomas-Aniola, José Manuel Gómez Pérez, Andres Garcia Silva, Christian Berrío, Ulrich Germann, Steve Renals, Ondrej Klejch |
Abstract | With 24 official EU and many additional languages, multilingualism in Europe and an inclusive Digital Single Market can only be enabled through Language Technologies (LTs). European LT business is dominated by hundreds of SMEs and a few large players. Many are world-class, with technologies that outperform the global players. However, European LT business is also fragmented, by nation states, languages, verticals and sectors, significantly holding back its impact. The European Language Grid (ELG) project addresses this fragmentation by establishing the ELG as the primary platform for LT in Europe. The ELG is a scalable cloud platform, providing, in an easy-to-integrate way, access to hundreds of commercial and non-commercial LTs for all European languages, including running tools and services as well as data sets and resources. Once fully operational, it will enable the commercial and non-commercial European LT community to deposit and upload their technologies and data sets into the ELG, to deploy them through the grid, and to connect with other resources. The ELG will boost the Multilingual Digital Single Market towards a thriving European LT community, creating new jobs and opportunities. Furthermore, the ELG project organises two open calls for up to 20 pilot projects. It also sets up 32 National Competence Centres (NCCs) and the European LT Council (LTC) for outreach and coordination purposes. |
Tasks | |
Published | 2020-03-30 |
URL | https://arxiv.org/abs/2003.13551v1 |
https://arxiv.org/pdf/2003.13551v1.pdf | |
PWC | https://paperswithcode.com/paper/european-language-grid-an-overview |
Repo | |
Framework | |
A Corpus of Controlled Opinionated and Knowledgeable Movie Discussions for Training Neural Conversation Models
Title | A Corpus of Controlled Opinionated and Knowledgeable Movie Discussions for Training Neural Conversation Models |
Authors | Fabian Galetzka, Chukwuemeka U. Eneh, David Schlangen |
Abstract | Fully data driven Chatbots for non-goal oriented dialogues are known to suffer from inconsistent behaviour across their turns, stemming from a general difficulty in controlling parameters like their assumed background personality and knowledge of facts. One reason for this is the relative lack of labeled data from which personality consistency and fact usage could be learned together with dialogue behaviour. To address this, we introduce a new labeled dialogue dataset in the domain of movie discussions, where every dialogue is based on pre-specified facts and opinions. We thoroughly validate the collected dialogue for adherence of the participants to their given fact and opinion profile, and find that the general quality in this respect is high. This process also gives us an additional layer of annotation that is potentially useful for training models. We introduce as a baseline an end-to-end trained self-attention decoder model trained on this data and show that it is able to generate opinionated responses that are judged to be natural and knowledgeable and show attentiveness. |
Tasks | |
Published | 2020-03-30 |
URL | https://arxiv.org/abs/2003.13342v1 |
https://arxiv.org/pdf/2003.13342v1.pdf | |
PWC | https://paperswithcode.com/paper/a-corpus-of-controlled-opinionated-and |
Repo | |
Framework | |
Non-asymptotic Superlinear Convergence of Standard Quasi-Newton Methods
Title | Non-asymptotic Superlinear Convergence of Standard Quasi-Newton Methods |
Authors | Qiujiang Jin, Aryan Mokhtari |
Abstract | In this paper, we study the non-asymptotic superlinear convergence rate of DFP and BFGS, which are two well-known quasi-Newton methods. The asymptotic superlinear convergence rate of these quasi-Newton methods has been extensively studied, but their explicit finite time local convergence rate has not been established yet. In this paper, we provide a finite time (non-asymptotic) convergence analysis for BFGS and DFP methods under the assumptions that the objective function is strongly convex, its gradient is Lipschitz continuous, and its Hessian is Lipschitz continuous only in the direction of the optimal solution. We show that in a local neighborhood of the optimal solution, the iterates generated by both DFP and BFGS converge to the optimal solution at a superlinear rate of $\mathcal{O}((\frac{1}{ {k}})^{k/2})$, where $k$ is the number of iterations. In particular, for a specific choice of the local neighborhood, both DFP and BFGS converge to the optimal solution at the rate of $(\frac{0.85}{k})^{k/2}$. Our theoretical guarantee is one of the first results that provide a non-asymptotic superlinear convergence rate for DFP and BFGS quasi-Newton methods. |
Tasks | |
Published | 2020-03-30 |
URL | https://arxiv.org/abs/2003.13607v1 |
https://arxiv.org/pdf/2003.13607v1.pdf | |
PWC | https://paperswithcode.com/paper/non-asymptotic-superlinear-convergence-of |
Repo | |
Framework | |
Simulated annealing based heuristic for multiple agile satellites scheduling under cloud coverage uncertainty
Title | Simulated annealing based heuristic for multiple agile satellites scheduling under cloud coverage uncertainty |
Authors | Chao Han, Yi Gu, Guohua Wu, Xinwei Wang |
Abstract | Agile satellites are the new generation of Earth observation satellites (EOSs) with stronger attitude maneuvering capability. Since optical remote sensing instruments equipped on satellites cannot see through the cloud, the cloud coverage has a significant influence on the satellite observation missions. We are the first to address multiple agile EOSs scheduling problem under cloud coverage uncertainty where the objective aims to maximize the entire observation profit. The chance constraint programming model is adopted to describe the uncertainty initially, and the observation profit under cloud coverage uncertainty is then calculated via sample approximation method. Subsequently, an improved simulated annealing based heuristic combining a fast insertion strategy is proposed for large-scale observation missions. The experimental results show that the improved simulated annealing heuristic outperforms other algorithms for the multiple AEOSs scheduling problem under cloud coverage uncertainty, which verifies the efficiency and effectiveness of the proposed algorithm. |
Tasks | |
Published | 2020-03-14 |
URL | https://arxiv.org/abs/2003.08363v1 |
https://arxiv.org/pdf/2003.08363v1.pdf | |
PWC | https://paperswithcode.com/paper/simulated-annealing-based-heuristic-for |
Repo | |
Framework | |
In Defense of Graph Inference Algorithms for Weakly Supervised Object Localization
Title | In Defense of Graph Inference Algorithms for Weakly Supervised Object Localization |
Authors | Amir Rahimi, Amirreza Shaban, Thalaiyasingam Ajanthan, Richard Hartley, Byron Boots |
Abstract | Weakly Supervised Object Localization (WSOL) methods have become increasingly popular since they only require image level labels as opposed to expensive bounding box annotations required by fully supervised algorithms. Typically, a WSOL model is first trained to predict class generic objectness scores on an off-the-shelf fully supervised source dataset and then it is progressively adapted to learn the objects in the weakly supervised target dataset. In this work, we argue that learning only an objectness function is a weak form of knowledge transfer and propose to learn a classwise pairwise similarity function that directly compares two input proposals as well. The combined localization model and the estimated object annotations are jointly learned in an alternating optimization paradigm as is typically done in standard WSOL methods. In contrast to the existing work that learns pairwise similarities, our proposed approach optimizes a unified objective with convergence guarantee and it is computationally efficient for large-scale applications. Experiments on the COCO and ILSVRC 2013 detection datasets show that the performance of the localization model improves significantly with the inclusion of pairwise similarity function. For instance, in the ILSVRC dataset, the Correct Localization (CorLoc) performance improves from 72.7% to 78.2% which is a new state-of-the-art for weakly supervised object localization task. |
Tasks | Object Localization, Transfer Learning, Weakly-Supervised Object Localization |
Published | 2020-03-18 |
URL | https://arxiv.org/abs/2003.08375v1 |
https://arxiv.org/pdf/2003.08375v1.pdf | |
PWC | https://paperswithcode.com/paper/in-defense-of-graph-inference-algorithms-for |
Repo | |
Framework | |