January 29, 2020

3322 words 16 mins read

Paper Group ANR 571

Deep Reinforcement Learning for Scheduling in Cellular Networks. Semantic Matching by Weakly Supervised 2D Point Set Registration. Recognizing American Sign Language Manual Signs from RGB-D Videos. Shareable Representations for Search Query Understanding. Fast Bayesian Restoration of Poisson Corrupted Images with INLA. Three dimensional waveguide-i …

Deep Reinforcement Learning for Scheduling in Cellular Networks


Title	Deep Reinforcement Learning for Scheduling in Cellular Networks
Authors	Jian Wang, Chen Xu, Yourui Huangfu, Rong Li, Yiqun Ge, Jun Wang
Abstract	Integrating artificial intelligence (AI) into wireless networks has drawn significant interest in both industry and academia. A common solution is to replace partial or even all modules in the conventional systems, which is often lack of efficiency and robustness due to their ignoring of expert knowledge. In this paper, we take deep reinforcement learning (DRL) based scheduling as an example to investigate how expert knowledge can help with AI module in cellular networks. A simulation platform, which has considered link adaption, feedback and other practical mechanisms, is developed to facilitate the investigation. Besides the traditional way, which is learning directly from the environment, for training DRL agent, we propose two novel methods, i.e., learning from a dual AI module and learning from the expert solution. The results show that, for the considering scheduling problem, DRL training procedure can be improved on both performance and convergence speed by involving the expert knowledge. Hence, instead of replacing conventional scheduling module in the system, adding a newly introduced AI module, which is capable to interact with the conventional module and provide more flexibility, is a more feasible solution.
Tasks
Published	2019-05-15
URL	https://arxiv.org/abs/1905.05914v2
PDF	https://arxiv.org/pdf/1905.05914v2.pdf
PWC	https://paperswithcode.com/paper/deep-reinforcement-learning-for-scheduling-in-1
Repo
Framework

Semantic Matching by Weakly Supervised 2D Point Set Registration


Title	Semantic Matching by Weakly Supervised 2D Point Set Registration
Authors	Zakaria Laskar, Hamed R. Tavakoli, Juho Kannala
Abstract	In this paper we address the problem of establishing correspondences between different instances of the same object. The problem is posed as finding the geometric transformation that aligns a given image pair. We use a convolutional neural network (CNN) to directly regress the parameters of the transformation model. The alignment problem is defined in the setting where an unordered set of semantic key-points per image are available, but, without the correspondence information. To this end we propose a novel loss function based on cyclic consistency that solves this 2D point set registration problem by inferring the optimal geometric transformation model parameters. We train and test our approach on a standard benchmark dataset Proposal-Flow (PF-PASCAL)\cite{proposal_flow}. The proposed approach achieves state-of-the-art results demonstrating the effectiveness of the method. In addition, we show our approach further benefits from additional training samples in PF-PASCAL generated by using category level information.
Tasks
Published	2019-01-24
URL	http://arxiv.org/abs/1901.08341v1
PDF	http://arxiv.org/pdf/1901.08341v1.pdf
PWC	https://paperswithcode.com/paper/semantic-matching-by-weakly-supervised-2d
Repo
Framework

Recognizing American Sign Language Manual Signs from RGB-D Videos


Title	Recognizing American Sign Language Manual Signs from RGB-D Videos
Authors	Longlong Jing, Elahe Vahdani, Matt Huenerfauth, Yingli Tian
Abstract	In this paper, we propose a 3D Convolutional Neural Network (3DCNN) based multi-stream framework to recognize American Sign Language (ASL) manual signs (consisting of movements of the hands, as well as non-manual face movements in some cases) in real-time from RGB-D videos, by fusing multimodality features including hand gestures, facial expressions, and body poses from multi-channels (RGB, depth, motion, and skeleton joints). To learn the overall temporal dynamics in a video, a proxy video is generated by selecting a subset of frames for each video which are then used to train the proposed 3DCNN model. We collect a new ASL dataset, ASL-100-RGBD, which contains 42 RGB-D videos captured by a Microsoft Kinect V2 camera, each of 100 ASL manual signs, including RGB channel, depth maps, skeleton joints, face features, and HDface. The dataset is fully annotated for each semantic region (i.e. the time duration of each word that the human signer performs). Our proposed method achieves 92.88 accuracy for recognizing 100 ASL words in our newly collected ASL-100-RGBD dataset. The effectiveness of our framework for recognizing hand gestures from RGB-D videos is further demonstrated on the Chalearn IsoGD dataset and achieves 76 accuracy which is 5.51 higher than the state-of-the-art work in terms of average fusion by using only 5 channels instead of 12 channels in the previous work.
Tasks
Published	2019-06-07
URL	https://arxiv.org/abs/1906.02851v1
PDF	https://arxiv.org/pdf/1906.02851v1.pdf
PWC	https://paperswithcode.com/paper/recognizing-american-sign-language-manual
Repo
Framework

Shareable Representations for Search Query Understanding


Title	Shareable Representations for Search Query Understanding
Authors	Mukul Kumar, Youna Hu, Will Headden, Rahul Goutam, Heran Lin, Bing Yin
Abstract	Understanding search queries is critical for shopping search engines to deliver a satisfying customer experience. Popular shopping search engines receive billions of unique queries yearly, each of which can depict any of hundreds of user preferences or intents. In order to get the right results to customers it must be known queries like “inexpensive prom dresses” are intended to not only surface results of a certain product type but also products with a low price. Referred to as query intents, examples also include preferences for author, brand, age group, or simply a need for customer service. Recent works such as BERT have demonstrated the success of a large transformer encoder architecture with language model pre-training on a variety of NLP tasks. We adapt such an architecture to learn intents for search queries and describe methods to account for the noisiness and sparseness of search query data. We also describe cost effective ways of hosting transformer encoder models in context with low latency requirements. With the right domain-specific training we can build a shareable deep learning model whose internal representation can be reused for a variety of query understanding tasks including query intent identification. Model sharing allows for fewer large models needed to be served at inference time and provides a platform to quickly build and roll out new search query classifiers.
Tasks	Language Modelling
Published	2019-12-20
URL	https://arxiv.org/abs/2001.04345v1
PDF	https://arxiv.org/pdf/2001.04345v1.pdf
PWC	https://paperswithcode.com/paper/shareable-representations-for-search-query
Repo
Framework

Fast Bayesian Restoration of Poisson Corrupted Images with INLA


Title	Fast Bayesian Restoration of Poisson Corrupted Images with INLA
Authors	Takahiro Kawashima, Hayaru Shouno
Abstract	Photon-limited images are often seen in fields such as medical imaging. Although the number of collected photons on an image sensor statistically follows Poisson distribution, this type of noise is intractable, unlike Gaussian noise. In this study, we propose a Bayesian restoration method of Poisson corrupted image using Integrated Nested Laplace Approximation (INLA), which is a computational method to evaluate marginalized posterior distributions of latent Gaussian models (LGMs). When the original image can be regarded as ICAR (intrinsic conditional auto-regressive) model reasonably, our method performs very faster than well-known ones such as loopy belief propagation-based method and Markov chain Monte Carlo (MCMC) without decreasing the accuracy.
Tasks
Published	2019-04-02
URL	http://arxiv.org/abs/1904.01357v1
PDF	http://arxiv.org/pdf/1904.01357v1.pdf
PWC	https://paperswithcode.com/paper/fast-bayesian-restoration-of-poisson
Repo
Framework

Three dimensional waveguide-interconnects for scalable integration of photonic neural networks


Title	Three dimensional waveguide-interconnects for scalable integration of photonic neural networks
Authors	Johnny Moughames, Xavier Porte, Michael Thiel, Gwenn Ulliac, Maxime Jacquot, Laurent Larger, Muamer Kadic, Daniel Brunner
Abstract	Photonic waveguides are prime candidates for integrated and parallel photonic interconnects. Such interconnects correspond to large-scale vector matrix products, which are at the heart of neural network computation. However, parallel interconnect circuits realized in two dimensions, for example by lithography, are strongly limited in size due to disadvantageous scaling. We use three dimensional (3D) printed photonic waveguides to overcome this limitation. 3D optical-couplers with fractal topology efficiently connect large numbers of input and output channels, and we show that the substrate’s footprint area scales linearly. Going beyond simple couplers, we introduce functional circuits for discrete spatial filters identical to those used in deep convolutional neural networks.
Tasks
Published	2019-12-17
URL	https://arxiv.org/abs/1912.08203v2
PDF	https://arxiv.org/pdf/1912.08203v2.pdf
PWC	https://paperswithcode.com/paper/three-dimensional-waveguide-interconnects-for
Repo
Framework

Simple Kinematic Feedback Enhances Autonomous Learning in Bio-Inspired Tendon-Driven Systems


Title	Simple Kinematic Feedback Enhances Autonomous Learning in Bio-Inspired Tendon-Driven Systems
Authors	Ali Marjaninejad, Darío Urbina-Meléndez, Francisco J. Valero-Cuevas
Abstract	Error feedback is known to improve performance by correcting control signals in response to perturbations. Here we show how adding simple error feedback can also accelerate and robustify autonomous learning in a tendon-driven robot. We implemented two versions of the General-to-Particular (G2P) autonomous learning algorithm to produce multiple movement tasks using a tendon-driven leg with two joints and three tendons: one with and one without kinematic feedback. As expected, feedback improved performance in simulation and hardware. However, we see these improvements even in the presence of sensory delays of up to 100 ms and when experiencing substantial contact collisions. Importantly, feedback accelerates learning and enhances G2P’s continual refinement of the initial inverse map by providing the system with more relevant data to train on. This allows the system to perform well even after only 60 seconds of initial motor babbling.
Tasks
Published	2019-07-10
URL	https://arxiv.org/abs/1907.04539v2
PDF	https://arxiv.org/pdf/1907.04539v2.pdf
PWC	https://paperswithcode.com/paper/simple-kinematic-feedback-enhances-autonomous
Repo
Framework

Baby steps towards few-shot learning with multiple semantics


Title	Baby steps towards few-shot learning with multiple semantics
Authors	Eli Schwartz, Leonid Karlinsky, Rogerio Feris, Raja Giryes, Alex M. Bronstein
Abstract	Learning from one or few visual examples is one of the key capabilities of humans since early infancy, but is still a significant challenge for modern AI systems. While considerable progress has been achieved in few-shot learning from a few image examples, much less attention has been given to the verbal descriptions that are usually provided to infants when they are presented with a new object. In this paper, we focus on the role of additional semantics that can significantly facilitate few-shot visual learning. Building upon recent advances in few-shot learning with additional semantic information, we demonstrate that further improvements are possible by combining multiple and richer semantics (category labels, attributes, and natural language descriptions). Using these ideas, we offer the community new results on the popular miniImageNet and CUB few-shot benchmarks, comparing favorably to the previous state-of-the-art results for both visual only and visual plus semantics-based approaches. We also performed an ablation study investigating the components and design choices of our approach.
Tasks	Few-Shot Image Classification, Few-Shot Learning
Published	2019-06-05
URL	https://arxiv.org/abs/1906.01905v2
PDF	https://arxiv.org/pdf/1906.01905v2.pdf
PWC	https://paperswithcode.com/paper/baby-steps-towards-few-shot-learning-with
Repo
Framework

Networked Control of Nonlinear Systems under Partial Observation Using Continuous Deep Q-Learning


Title	Networked Control of Nonlinear Systems under Partial Observation Using Continuous Deep Q-Learning
Authors	Junya Ikemoto, Toshimitsu Ushio
Abstract	In this paper, we propose a design of a model-free networked controller for a nonlinear plant whose mathematical model is unknown. In a networked control system, the controller and plant are located away from each other and exchange data over a network, which causes network delays that may fluctuate randomly due to network routing. So, in this paper, we assume that the current network delay is not known but the maximum value of fluctuating network delays is known beforehand. Moreover, we also assume that the sensor cannot observe all state variables of the plant. Under these assumption, we apply continuous deep Q-learning to the design of the networked controller. Then, we introduce an extended state consisting of a sequence of past control inputs and outputs as inputs to the deep neural network. By simulation, it is shown that, using the extended state, the controller can learn a control policy robust to the fluctuation of the network delays under the partial observation.
Tasks	Q-Learning
Published	2019-08-28
URL	https://arxiv.org/abs/1908.10722v2
PDF	https://arxiv.org/pdf/1908.10722v2.pdf
PWC	https://paperswithcode.com/paper/networked-control-of-nonlinear-systems-under
Repo
Framework

DyANE: Dynamics-aware node embedding for temporal networks


Title	DyANE: Dynamics-aware node embedding for temporal networks
Authors	Koya Sato, Mizuki Oka, Alain Barrat, Ciro Cattuto
Abstract	Low-dimensional vector representations of network nodes have proven successful to feed graph data to machine learning algorithms and to improve performance across diverse tasks. Most of the embedding techniques, however, have been developed with the goal of achieving dense, low-dimensional encoding of network structure and patterns. Here, we present a node embedding technique aimed at providing low-dimensional feature vectors that are informative of dynamical processes occurring over temporal networks - rather than of the network structure itself - with the goal of enabling prediction tasks related to the evolution and outcome of these processes. We achieve this by using a modified supra-adjacency representation of temporal networks and building on standard embedding techniques for static graphs based on random-walks. We show that the resulting embedding vectors are useful for prediction tasks related to paradigmatic dynamical processes, namely epidemic spreading over empirical temporal networks. In particular, we illustrate the performance of our approach for the prediction of nodes’ epidemic states in a single instance of the spreading process. We show how framing this task as a supervised multi-label classification task on the embedding vectors allows us to estimate the temporal evolution of the entire system from a partial sampling of nodes at random times, with potential impact for nowcasting infectious disease dynamics.
Tasks	Multi-Label Classification
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05976v1
PDF	https://arxiv.org/pdf/1909.05976v1.pdf
PWC	https://paperswithcode.com/paper/dyane-dynamics-aware-node-embedding-for
Repo
Framework

Optimal Explanations of Linear Models


Title	Optimal Explanations of Linear Models
Authors	Dimitris Bertsimas, Arthur Delarue, Patrick Jaillet, Sebastien Martin
Abstract	When predictive models are used to support complex and important decisions, the ability to explain a model’s reasoning can increase trust, expose hidden biases, and reduce vulnerability to adversarial attacks. However, attempts at interpreting models are often ad hoc and application-specific, and the concept of interpretability itself is not well-defined. We propose a general optimization framework to create explanations for linear models. Our methodology decomposes a linear model into a sequence of models of increasing complexity using coordinate updates on the coefficients. Computing this decomposition optimally is a difficult optimization problem for which we propose exact algorithms and scalable heuristics. By solving this problem, we can derive a parametrized family of interpretability metrics for linear models that generalizes typical proxies, and study the tradeoff between interpretability and predictive accuracy.
Tasks
Published	2019-07-08
URL	https://arxiv.org/abs/1907.04669v1
PDF	https://arxiv.org/pdf/1907.04669v1.pdf
PWC	https://paperswithcode.com/paper/optimal-explanations-of-linear-models
Repo
Framework

Music Source Separation in the Waveform Domain


Title	Music Source Separation in the Waveform Domain
Authors	Alexandre Défossez, Nicolas Usunier, Léon Bottou, Francis Bach
Abstract	Source separation for music is the task of isolating contributions, or stems, from different instruments recorded individually and arranged together to form a song. Such components include voice, bass, drums and any other accompaniments. Contrarily to many audio synthesis tasks where the best performances are achieved by models that directly generate the waveform, the state-of-the-art in source separation for music is to compute masks on the magnitude spectrum. In this paper, we first show that an adaptation of Conv-Tasnet (Luo & Mesgarani, 2019), a waveform-to-waveform model for source separation for speech, significantly beats the state-of-the-art on the MusDB dataset, the standard benchmark of multi-instrument source separation. Second, we observe that Conv-Tasnet follows a masking approach on the input signal, which has the potential drawback of removing parts of the relevant source without the capacity to reconstruct it. We propose Demucs, a new waveform-to-waveform model, which has an architecture closer to models for audio generation with more capacity on the decoder. Experiments on the MusDB dataset show that Demucs beats previously reported results in terms of signal to distortion ratio (SDR), but lower than Conv-Tasnet. Human evaluations show that Demucs has significantly higher quality (as assessed by mean opinion score) than Conv-Tasnet, but slightly more contamination from other sources, which explains the difference in SDR. Additional experiments with a larger dataset suggest that the gap in SDR between Demucs and Conv-Tasnet shrinks, showing that our approach is promising.
Tasks	Audio Generation, Music Source Separation
Published	2019-11-27
URL	https://arxiv.org/abs/1911.13254v1
PDF	https://arxiv.org/pdf/1911.13254v1.pdf
PWC	https://paperswithcode.com/paper/music-source-separation-in-the-waveform-1
Repo
Framework

A Game Theoretical Framework for the Evaluation of Unmanned Aircraft Systems Airspace Integration Concepts


Title	A Game Theoretical Framework for the Evaluation of Unmanned Aircraft Systems Airspace Integration Concepts
Authors	Negin Musavi
Abstract	Predicting the outcomes of integrating Unmanned Aerial Systems (UAS) into the National Aerospace (NAS) is a complex problem which is required to be addressed by simulation studies before allowing the routine access of UAS into the NAS. This thesis focuses on providing 2D and 3D simulation frameworks using a game theoretical methodology to evaluate integration concepts in scenarios where manned and unmanned air vehicles co-exist. The fundamental gap in the literature is that the models of interaction between manned and unmanned vehicles are insufficient: a) they assume that pilot behavior is known a priori and b) they disregard decision making processes. The contribution of this work is to propose a modeling framework, in which, human pilot reactions are modeled using reinforcement learning and a game theoretical concept called level-k reasoning to fill this gap. The level-k reasoning concept is based on the assumption that humans have various levels of decision making. Reinforcement learning is a mathematical learning method that is rooted in human learning. In this work, a classical and an approximate reinforcement learning (Neural Fitted Q Iteration) methods are used to model time-extended decisions of pilots with 2D and 3D maneuvers. An analysis of UAS integration is conducted using example scenarios in the presence of manned aircraft and fully autonomous UAS equipped with sense and avoid algorithms.
Tasks	Decision Making
Published	2019-04-17
URL	http://arxiv.org/abs/1904.08477v1
PDF	http://arxiv.org/pdf/1904.08477v1.pdf
PWC	https://paperswithcode.com/paper/a-game-theoretical-framework-for-the
Repo
Framework

Collecting and Analyzing Multidimensional Data with Local Differential Privacy


Title	Collecting and Analyzing Multidimensional Data with Local Differential Privacy
Authors	Ning Wang, Xiaokui Xiao, Yin Yang, Jun Zhao, Siu Cheung Hui, Hyejin Shin, Junbum Shin, Ge Yu
Abstract	Local differential privacy (LDP) is a recently proposed privacy standard for collecting and analyzing data, which has been used, e.g., in the Chrome browser, iOS and macOS. In LDP, each user perturbs her information locally, and only sends the randomized version to an aggregator who performs analyses, which protects both the users and the aggregator against private information leaks. Although LDP has attracted much research attention in recent years, the majority of existing work focuses on applying LDP to complex data and/or analysis tasks. In this paper, we point out that the fundamental problem of collecting multidimensional data under LDP has not been addressed sufficiently, and there remains much room for improvement even for basic tasks such as computing the mean value over a single numeric attribute under LDP. Motivated by this, we first propose novel LDP mechanisms for collecting a numeric attribute, whose accuracy is at least no worse (and usually better) than existing solutions in terms of worst-case noise variance. Then, we extend these mechanisms to multidimensional data that can contain both numeric and categorical attributes, where our mechanisms always outperform existing solutions regarding worst-case noise variance. As a case study, we apply our solutions to build an LDP-compliant stochastic gradient descent algorithm (SGD), which powers many important machine learning tasks. Experiments using real datasets confirm the effectiveness of our methods, and their advantages over existing solutions.
Tasks
Published	2019-06-28
URL	https://arxiv.org/abs/1907.00782v1
PDF	https://arxiv.org/pdf/1907.00782v1.pdf
PWC	https://paperswithcode.com/paper/collecting-and-analyzing-multidimensional
Repo
Framework


Title	Filling Conversation Ellipsis for Better Social Dialog Understanding
Authors	Xiyuan Zhang, Chengxi Li, Dian Yu, Samuel Davidson, Zhou Yu
Abstract	The phenomenon of ellipsis is prevalent in social conversations. Ellipsis increases the difficulty of a series of downstream language understanding tasks, such as dialog act prediction and semantic role labeling. We propose to resolve ellipsis through automatic sentence completion to improve language understanding. However, automatic ellipsis completion can result in output which does not accurately reflect user intent. To address this issue, we propose a method which considers both the original utterance that has ellipsis and the automatically completed utterance in dialog act and semantic role labeling tasks. Specifically, we first complete user utterances to resolve ellipsis using an end-to-end pointer network model. We then train a prediction model using both utterances containing ellipsis and our automatically completed utterances. Finally, we combine the prediction results from these two utterances using a selection model that is guided by expert knowledge. Our approach improves dialog act prediction and semantic role labeling by 1.3% and 2.5% in F1 score respectively in social conversations. We also present an open-domain human-machine conversation dataset with manually completed user utterances and annotated semantic role labeling after manual completion.
Tasks	Semantic Role Labeling
Published	2019-11-25
URL	https://arxiv.org/abs/1911.10776v1
PDF	https://arxiv.org/pdf/1911.10776v1.pdf
PWC	https://paperswithcode.com/paper/filling-conversation-ellipsis-for-better
Repo
Framework