Paper Group ANR 273
Towards Practical Indoor Positioning Based on Massive MIMO Systems. Scene Recognition with Prototype-agnostic Scene Layout. Deep learning with sentence embeddings pre-trained on biomedical corpora improves the performance of finding similar sentences in electronic medical records. Understanding and Improving Virtual Adversarial Training. Rare geome …
Towards Practical Indoor Positioning Based on Massive MIMO Systems
Title | Towards Practical Indoor Positioning Based on Massive MIMO Systems |
Authors | Mark Widmaier, Maximilian Arnold, Sebastian Dörner, Sebastian Cammerer, Stephan ten Brink |
Abstract | We showcase the practicability of an indoor positioning system (IPS) solely based on Neural Networks (NNs) and the channel state information (CSI) of a (Massive) multiple-input multiple-output (MIMO) communication system, i.e., only build on the basis of data that is already existent in today’s systems. As such our IPS system promises both, a good accuracy without the need of any additional protocol/signaling overhead for the user localization task. In particular, we propose a tailored NN structure with an additional phase branch as feature extractor and (compared to previous results) a significantly reduced amount of trainable parameters, leading to a minimization of the amount of required training data. We provide actual measurements for indoor scenarios with up to 64 antennas covering a large area of 80m2. In the second part, several robustness investigations for real-measurements are conducted, i.e., once trained, we analyze the recall accuracy over a time-period of several days. Further, we analyze the impact of pedestrians walking in-between the measurements and show that finetuning and pre-training of the NN helps to mitigate effects of hardware drifts and alterations in the propagation environment over time. This reduces the amount of required training samples at equal precision and, thereby, decreases the effort of the costly training data acquisition |
Tasks | |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11858v1 |
https://arxiv.org/pdf/1905.11858v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-practical-indoor-positioning-based-on |
Repo | |
Framework | |
Scene Recognition with Prototype-agnostic Scene Layout
Title | Scene Recognition with Prototype-agnostic Scene Layout |
Authors | Gongwei Chen, Xinhang Song, Haitao Zeng, Shuqiang Jiang |
Abstract | Abstract— Exploiting the spatial structure in scene images is a key research direction for scene recognition. Due to the large intra-class structural diversity, building and modeling flexible structural layout to adapt various image characteristics is a challenge. Existing structural modeling methods in scene recognition either focus on predefined grids or rely on learned prototypes, which all have limited representative ability. In this paper, we propose Prototype-agnostic Scene Layout (PaSL) construction method to build the spatial structure for each image without conforming to any prototype. Our PaSL can flexibly capture the diverse spatial characteristic of scene images and have considerable generalization capability. Given a PaSL, we build Layout Graph Network (LGN) where regions in PaSL are defined as nodes and two kinds of independent relations between regions are encoded as edges. The LGN aims to incorporate two topological structures (formed in spatial and semantic similarity dimensions) into image representations through graph convolution. Extensive experiments show that our approach achieves state-of-the-art results on widely recognized MIT67 and SUN397 datasets without multi-model or multi-scale fusion. Moreover, we also conduct the experiments on one of the largest scale datasets, Places365. The results demonstrate the proposed method can be well generalized and obtains competitive performance. |
Tasks | Scene Recognition, Semantic Similarity, Semantic Textual Similarity |
Published | 2019-09-07 |
URL | https://arxiv.org/abs/1909.03234v1 |
https://arxiv.org/pdf/1909.03234v1.pdf | |
PWC | https://paperswithcode.com/paper/scene-recognition-with-prototype-agnostic |
Repo | |
Framework | |
Deep learning with sentence embeddings pre-trained on biomedical corpora improves the performance of finding similar sentences in electronic medical records
Title | Deep learning with sentence embeddings pre-trained on biomedical corpora improves the performance of finding similar sentences in electronic medical records |
Authors | Qingyu Chen, Jingcheng Du, Sun Kim, W. John Wilbur, Zhiyong Lu |
Abstract | Capturing sentence semantics plays a vital role in a range of text mining applications. Despite continuous efforts on the development of related datasets and models in the general domain, both datasets and models are limited in biomedical and clinical domains. The BioCreative/OHNLP organizers have made the first attempt to annotate 1,068 sentence pairs from clinical notes and have called for a community effort to tackle the Semantic Textual Similarity (BioCreative/OHNLP STS) challenge. We developed models using traditional machine learning and deep learning approaches. For the post challenge, we focus on two models: the Random Forest and the Encoder Network. We applied sentence embeddings pre-trained on PubMed abstracts and MIMIC-III clinical notes and updated the Random Forest and the Encoder Network accordingly. The official results demonstrated our best submission was the ensemble of eight models. It achieved a Person correlation coefficient of 0.8328, the highest performance among 13 submissions from 4 teams. For the post challenge, the performance of both Random Forest and the Encoder Network was improved; in particular, the correlation of the Encoder Network was improved by ~13%. During the challenge task, no end-to-end deep learning models had better performance than machine learning models that take manually-crafted features. In contrast, with the sentence embeddings pre-trained on biomedical corpora, the Encoder Network now achieves a correlation of ~0.84, which is higher than the original best model. The ensembled model taking the improved versions of the Random Forest and Encoder Network as inputs further increased performance to 0.8528. Deep learning models with sentence embeddings pre-trained on biomedical corpora achieve the highest performance on the test set. |
Tasks | Semantic Textual Similarity, Sentence Embeddings |
Published | 2019-09-06 |
URL | https://arxiv.org/abs/1909.03044v1 |
https://arxiv.org/pdf/1909.03044v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-with-sentence-embeddings-pre |
Repo | |
Framework | |
Understanding and Improving Virtual Adversarial Training
Title | Understanding and Improving Virtual Adversarial Training |
Authors | Dongha Kim, Yongchan Choi, Yongdai Kim |
Abstract | In semi-supervised learning, virtual adversarial training (VAT) approach is one of the most attractive method due to its intuitional simplicity and powerful performances. VAT finds a classifier which is robust to data perturbation toward the adversarial direction. In this study, we provide a fundamental explanation why VAT works well in semi-supervised learning case and propose new techniques which are simple but powerful to improve the VAT method. Especially we employ the idea of Bad GAN approach, which utilizes bad samples distributed on complement of the support of the input data, without any additional deep generative architectures. We generate bad samples of high-quality by use of the adversarial training used in VAT and also give theoretical explanations why the adversarial training is good at both generating bad samples. An advantage of our proposed method is to achieve the competitive performances compared with other recent studies with much fewer computations. We demonstrate advantages our method by various experiments with well known benchmark image datasets. |
Tasks | |
Published | 2019-09-15 |
URL | https://arxiv.org/abs/1909.06737v1 |
https://arxiv.org/pdf/1909.06737v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-and-improving-virtual |
Repo | |
Framework | |
Rare geometries: revealing rare categories via dimension-driven statistics
Title | Rare geometries: revealing rare categories via dimension-driven statistics |
Authors | Henry Kvinge, Elin Farnell, Jingya Li, Yujia Chen |
Abstract | In many situations, classes of data points of primary interest also happen to be those that are least numerous. A well-known example is detection of fraudulent transactions among the collection of all financial transactions, the vast majority of which are legitimate. These types of problems fall under the label of `rare-category detection.’ There are two challenging aspects of these problems. The first is a general lack of labeled examples of the rare class and the second is the potential non-separability of the rare class from the majority (in terms of available features). Statistics related to the geometry of the rare class (such as its intrinsic dimension) can be significantly different from those for the majority class, reflecting the different dynamics driving variation in the different classes. In this paper we present a new supervised learning algorithm that uses a dimension-driven statistic, called the kappa-profile, to classify whether unlabeled points belong to a rare class. Our algorithm requires very few labeled examples and is invariant with respect to translation so that it performs equivalently on both separable and non-separable classes. | |
Tasks | |
Published | 2019-01-29 |
URL | https://arxiv.org/abs/1901.10585v2 |
https://arxiv.org/pdf/1901.10585v2.pdf | |
PWC | https://paperswithcode.com/paper/rare-geometries-revealing-rare-categories-via |
Repo | |
Framework | |
Feudal Multi-Agent Hierarchies for Cooperative Reinforcement Learning
Title | Feudal Multi-Agent Hierarchies for Cooperative Reinforcement Learning |
Authors | Sanjeevan Ahilan, Peter Dayan |
Abstract | We investigate how reinforcement learning agents can learn to cooperate. Drawing inspiration from human societies, in which successful coordination of many individuals is often facilitated by hierarchical organisation, we introduce Feudal Multi-agent Hierarchies (FMH). In this framework, a ‘manager’ agent, which is tasked with maximising the environmentally-determined reward function, learns to communicate subgoals to multiple, simultaneously-operating, ‘worker’ agents. Workers, which are rewarded for achieving managerial subgoals, take concurrent actions in the world. We outline the structure of FMH and demonstrate its potential for decentralised learning and control. We find that, given an adequate set of subgoals from which to choose, FMH performs, and particularly scales, substantially better than cooperative approaches that use a shared reward function. |
Tasks | |
Published | 2019-01-24 |
URL | http://arxiv.org/abs/1901.08492v1 |
http://arxiv.org/pdf/1901.08492v1.pdf | |
PWC | https://paperswithcode.com/paper/feudal-multi-agent-hierarchies-for |
Repo | |
Framework | |
Human Extraction and Scene Transition utilizing Mask R-CNN
Title | Human Extraction and Scene Transition utilizing Mask R-CNN |
Authors | Asati Minkesh, Kraittipong Worranitta, Miyachi Taizo |
Abstract | Object detection is a trendy branch of computer vision, especially on human recognition and pedestrian detection. Recognizing the complete body of a person has always been a difficult problem. Over the years, researchers proposed various methods, and recently, Mask R-CNN has made a breakthrough for instance segmentation. Based on Faster R-CNN, Mask R-CNN has been able to generate a segmentation mask for each instance. We propose an application to extracts multiple persons from images and videos for pleasant life scenes to grouping happy moments of people such as family or friends and a community for QOL (Quality Of Life). We likewise propose a methodology to put extracted images of persons into the new background. This enables a user to make a pleasant collection of happy facial expressions and actions of his/her family and friends in his/her life. Mask R-CNN detects all types of object masks from images. Then our algorithm considers only the target person and extracts a person only without obstacles, such as dogs in front of the person, and the user also can select multiple persons as their expectations. Our algorithm is effective for both an image and a video irrespective of the length of it. Our algorithm does not add any overhead to Mask R-CNN, running at 5 fps. We show examples of yoga-person in an image and a dancer in a dance-video frame. We hope our simple and effective approach would serve as a baseline for replacing the image background and help ease future research. |
Tasks | Instance Segmentation, Object Detection, Pedestrian Detection, Semantic Segmentation |
Published | 2019-07-20 |
URL | https://arxiv.org/abs/1907.08884v2 |
https://arxiv.org/pdf/1907.08884v2.pdf | |
PWC | https://paperswithcode.com/paper/human-extraction-and-scene-transition |
Repo | |
Framework | |
Randomised Bayesian Least-Squares Policy Iteration
Title | Randomised Bayesian Least-Squares Policy Iteration |
Authors | Nikolaos Tziortziotis, Christos Dimitrakakis, Michalis Vazirgiannis |
Abstract | We introduce Bayesian least-squares policy iteration (BLSPI), an off-policy, model-free, policy iteration algorithm that uses the Bayesian least-squares temporal-difference (BLSTD) learning algorithm to evaluate policies. An online variant of BLSPI has been also proposed, called randomised BLSPI (RBLSPI), that improves its policy based on an incomplete policy evaluation step. In online setting, the exploration-exploitation dilemma should be addressed as we try to discover the optimal policy by using samples collected by ourselves. RBLSPI exploits the advantage of BLSTD to quantify our uncertainty about the value function. Inspired by Thompson sampling, RBLSPI first samples a value function from a posterior distribution over value functions, and then selects actions based on the sampled value function. The effectiveness and the exploration abilities of RBLSPI are demonstrated experimentally in several environments. |
Tasks | |
Published | 2019-04-06 |
URL | http://arxiv.org/abs/1904.03535v1 |
http://arxiv.org/pdf/1904.03535v1.pdf | |
PWC | https://paperswithcode.com/paper/randomised-bayesian-least-squares-policy |
Repo | |
Framework | |
Scalable and Efficient Comparison-based Search without Features
Title | Scalable and Efficient Comparison-based Search without Features |
Authors | Daniyar Chumbalov, Lucas Maystre, Matthias Grossglauser |
Abstract | We consider the problem of finding a target object $t$ using pairwise comparisons, by asking an oracle questions of the form \emph{“Which object from the pair $(i,j)$ is more similar to $t$?''}. Objects live in a space of latent features, from which the oracle generates noisy answers. First, we consider the {\em non-blind} setting where these features are accessible. We propose a new Bayesian comparison-based search algorithm with noisy answers; it has low computational complexity yet is efficient in the number of queries. We provide theoretical guarantees, deriving the form of the optimal query and proving almost sure convergence to the target $t$. Second, we consider the \emph{blind} setting, where the object features are hidden from the search algorithm. In this setting, we combine our search method and a new distributional triplet embedding algorithm into one scalable learning framework called \textsc{Learn2Search}. We show that the query complexity of our approach on two real-world datasets is on par with the non-blind setting, which is not achievable using any of the current state-of-the-art embedding methods. |
Tasks | |
Published | 2019-05-13 |
URL | https://arxiv.org/abs/1905.05049v2 |
https://arxiv.org/pdf/1905.05049v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-search-efficiently-using |
Repo | |
Framework | |
CDPM: Convolutional Deformable Part Models for Semantically Aligned Person Re-identification
Title | CDPM: Convolutional Deformable Part Models for Semantically Aligned Person Re-identification |
Authors | Kan Wang, Changxing Ding, Stephen J. Maybank, Dacheng Tao |
Abstract | Part-level representations are essential for robust person re-identification. However, common errors that arise during pedestrian detection frequently result in severe misalignment problems for body parts, which degrade the quality of part representations. Accordingly, to deal with this problem, we propose a novel model named Convolutional Deformable Part Models (CDPM). CDPM works by decoupling the complex part alignment procedure into two easier steps: first, a vertical alignment step detects each body part in the vertical direction, with the help of a multi-task learning model; second, a horizontal refinement step based on attention suppresses the background information around each detected body part. Since these two steps are performed orthogonally and sequentially, the difficulty of part alignment is significantly reduced. In the testing stage, CDPM is able to accurately align flexible body parts without any need for outside information. Extensive experimental results demonstrate the effectiveness of the proposed CDPM for part alignment. Most impressively, CDPM achieves state-of-the-art performance on three large-scale datasets: Market-1501, DukeMTMC-ReID,and CUHK03. |
Tasks | Multi-Task Learning, Pedestrian Detection, Person Re-Identification |
Published | 2019-06-12 |
URL | https://arxiv.org/abs/1906.04976v2 |
https://arxiv.org/pdf/1906.04976v2.pdf | |
PWC | https://paperswithcode.com/paper/cdpm-convolutional-deformable-part-models-for |
Repo | |
Framework | |
Cross-subject Decoding of Eye Movement Goals from Local Field Potentials
Title | Cross-subject Decoding of Eye Movement Goals from Local Field Potentials |
Authors | Marko Angjelichinoski, John Choi, Taposh Banerjee, Bijan Pesaran, Vahid Tarokh |
Abstract | Objective. We consider the cross-subject decoding problem from local field potential (LFP) signals, where training data collected from the prefrontal cortex (PFC) of a source subject is used to decode intended motor actions in a destination subject. Approach. We propose a novel supervised transfer learning technique, referred to as data centering, which is used to adapt the feature space of the source to the feature space of the destination. The key ingredients of data centering are the transfer functions used to model the deterministic component of the relationship between the source and destination feature spaces. We propose an efficient data-driven estimation approach for linear transfer functions that uses the first and second order moments of the class-conditional distributions. Main result. We apply our data centering technique with linear transfer functions for cross-subject decoding of eye movement intentions in an experiment where two macaque monkeys perform memory-guided visual saccades to one of eight target locations. The results show peak cross-subject decoding performance of $80%$, which marks a substantial improvement over random choice decoder. In addition to this, data centering also outperforms standard sampling-based methods in setups with imbalanced training data. Significance. The analyses presented herein demonstrate that the proposed data centering is a viable novel technique for reliable LFP-based cross-subject brain-computer interfacing and neural prostheses. |
Tasks | Transfer Learning |
Published | 2019-11-08 |
URL | https://arxiv.org/abs/1911.03540v3 |
https://arxiv.org/pdf/1911.03540v3.pdf | |
PWC | https://paperswithcode.com/paper/cross-subject-decoding-of-eye-movement-goals |
Repo | |
Framework | |
3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera
Title | 3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera |
Authors | Iro Armeni, Zhi-Yang He, JunYoung Gwak, Amir R. Zamir, Martin Fischer, Jitendra Malik, Silvio Savarese |
Abstract | A comprehensive semantic understanding of a scene is important for many applications - but in what space should diverse semantic information (e.g., objects, scene categories, material types, texture, etc.) be grounded and what should be its structure? Aspiring to have one unified structure that hosts diverse types of semantics, we follow the Scene Graph paradigm in 3D, generating a 3D Scene Graph. Given a 3D mesh and registered panoramic images, we construct a graph that spans the entire building and includes semantics on objects (e.g., class, material, and other attributes), rooms (e.g., scene category, volume, etc.) and cameras (e.g., location, etc.), as well as the relationships among these entities. However, this process is prohibitively labor heavy if done manually. To alleviate this we devise a semi-automatic framework that employs existing detection methods and enhances them using two main constraints: I. framing of query images sampled on panoramas to maximize the performance of 2D detectors, and II. multi-view consistency enforcement across 2D detections that originate in different camera locations. |
Tasks | |
Published | 2019-10-06 |
URL | https://arxiv.org/abs/1910.02527v1 |
https://arxiv.org/pdf/1910.02527v1.pdf | |
PWC | https://paperswithcode.com/paper/3d-scene-graph-a-structure-for-unified |
Repo | |
Framework | |
Optimal Statistical Rates for Decentralised Non-Parametric Regression with Linear Speed-Up
Title | Optimal Statistical Rates for Decentralised Non-Parametric Regression with Linear Speed-Up |
Authors | Dominic Richards, Patrick Rebeschini |
Abstract | We analyse the learning performance of Distributed Gradient Descent in the context of multi-agent decentralised non-parametric regression with the square loss function when i.i.d. samples are assigned to agents. We show that if agents hold sufficiently many samples with respect to the network size, then Distributed Gradient Descent achieves optimal statistical rates with a number of iterations that scales, up to a threshold, with the inverse of the spectral gap of the gossip matrix divided by the number of samples owned by each agent raised to a problem-dependent power. The presence of the threshold comes from statistics. It encodes the existence of a “big data” regime where the number of required iterations does not depend on the network topology. In this regime, Distributed Gradient Descent achieves optimal statistical rates with the same order of iterations as gradient descent run with all the samples in the network. Provided the communication delay is sufficiently small, the distributed protocol yields a linear speed-up in runtime compared to the single-machine protocol. This is in contrast to decentralised optimisation algorithms that do not exploit statistics and only yield a linear speed-up in graphs where the spectral gap is bounded away from zero. Our results exploit the statistical concentration of quantities held by agents and shed new light on the interplay between statistics and communication in decentralised methods. Bounds are given in the standard non-parametric setting with source/capacity assumptions. |
Tasks | |
Published | 2019-05-08 |
URL | https://arxiv.org/abs/1905.03135v2 |
https://arxiv.org/pdf/1905.03135v2.pdf | |
PWC | https://paperswithcode.com/paper/optimal-statistical-rates-for-decentralised |
Repo | |
Framework | |
Steadiface: Real-Time Face-Centric Stabilization on Mobile Phones
Title | Steadiface: Real-Time Face-Centric Stabilization on Mobile Phones |
Authors | Fuhao Shi, Sung-Fang Tsai, Youyou Wang, Chia-Kai Liang |
Abstract | We present Steadiface, a new real-time face-centric video stabilization method that simultaneously removes hand shake and keeps subject’s head stable. We use a CNN to estimate the face landmarks and use them to optimize a stabilized head center. We then formulate an optimization problem to find a virtual camera pose that locates the face to the stabilized head center while retains smooth rotation and translation transitions across frames. We test the proposed method on fieldtest videos and show it stabilizes both the head motion and background. It is robust to large head pose, occlusion, facial appearance variations, and different kinds of camera motions. We show our method advances the state of art in selfie video stabilization by comparing against alternative methods. The whole process runs very efficiently on a modern mobile phone (8.1 ms/frame). |
Tasks | |
Published | 2019-05-03 |
URL | https://arxiv.org/abs/1905.01382v1 |
https://arxiv.org/pdf/1905.01382v1.pdf | |
PWC | https://paperswithcode.com/paper/steadiface-real-time-face-centric |
Repo | |
Framework | |
A Data-driven Storage Control Framework for Dynamic Pricing
Title | A Data-driven Storage Control Framework for Dynamic Pricing |
Authors | Jiaman Wu, Zhiqi Wang, Chenye Wu, Kui Wang, Yang Yu |
Abstract | Dynamic pricing is both an opportunity and a challenge to the demand side. It is an opportunity as it better reflects the real time market conditions and hence enables an active demand side. However, demand’s active participation does not necessarily lead to benefits. The challenge conventionally comes from the limited flexible resources and limited intelligent devices in demand side. The decreasing cost of storage system and the widely deployed smart meters inspire us to design a data-driven storage control framework for dynamic prices. We first establish a stylized model by assuming the knowledge and structure of dynamic price distributions, and design the optimal storage control policy. Based on Gaussian Mixture Model, we propose a practical data-driven control framework, which helps relax the assumptions in the stylized model. Numerical studies illustrate the remarkable performance of the proposed data-driven framework. |
Tasks | |
Published | 2019-12-01 |
URL | https://arxiv.org/abs/1912.01440v1 |
https://arxiv.org/pdf/1912.01440v1.pdf | |
PWC | https://paperswithcode.com/paper/a-data-driven-storage-control-framework-for |
Repo | |
Framework | |