Paper Group ANR 1760
Discretizing Continuous Action Space for On-Policy Optimization. Entropy Regularization with Discounted Future State Distribution in Policy Gradient Methods. Evolution of Robust High Speed Optical-Flow-Based Landing for Autonomous MAVs. Hair Segmentation on Time-of-Flight RGBD Images. Bounds for the VC Dimension of 1NN Prototype Sets. LSTM Language …
Discretizing Continuous Action Space for On-Policy Optimization
Title | Discretizing Continuous Action Space for On-Policy Optimization |
Authors | Yunhao Tang, Shipra Agrawal |
Abstract | In this work, we show that discretizing action space for continuous control is a simple yet powerful technique for on-policy optimization. The explosion in the number of discrete actions can be efficiently addressed by a policy with factorized distribution across action dimensions. We show that the discrete policy achieves significant performance gains with state-of-the-art on-policy optimization algorithms (PPO, TRPO, ACKTR) especially on high-dimensional tasks with complex dynamics. Additionally, we show that an ordinal parameterization of the discrete distribution can introduce the inductive bias that encodes the natural ordering between discrete actions. This ordinal architecture further significantly improves the performance of PPO/TRPO. |
Tasks | Continuous Control |
Published | 2019-01-29 |
URL | https://arxiv.org/abs/1901.10500v4 |
https://arxiv.org/pdf/1901.10500v4.pdf | |
PWC | https://paperswithcode.com/paper/discretizing-continuous-action-space-for-on |
Repo | |
Framework | |
Entropy Regularization with Discounted Future State Distribution in Policy Gradient Methods
Title | Entropy Regularization with Discounted Future State Distribution in Policy Gradient Methods |
Authors | Riashat Islam, Raihan Seraj, Pierre-Luc Bacon, Doina Precup |
Abstract | The policy gradient theorem is defined based on an objective with respect to the initial distribution over states. In the discounted case, this results in policies that are optimal for one distribution over initial states, but may not be uniformly optimal for others, no matter where the agent starts from. Furthermore, to obtain unbiased gradient estimates, the starting point of the policy gradient estimator requires sampling states from a normalized discounted weighting of states. However, the difficulty of estimating the normalized discounted weighting of states, or the stationary state distribution, is quite well-known. Additionally, the large sample complexity of policy gradient methods is often attributed to insufficient exploration, and to remedy this, it is often assumed that the restart distribution provides sufficient exploration in these algorithms. In this work, we propose exploration in policy gradient methods based on maximizing entropy of the discounted future state distribution. The key contribution of our work includes providing a practically feasible algorithm to estimate the normalized discounted weighting of states, i.e, the \textit{discounted future state distribution}. We propose that exploration can be achieved by entropy regularization with the discounted state distribution in policy gradients, where a metric for maximal coverage of the state space can be based on the entropy of the induced state distribution. The proposed approach can be considered as a three time-scale algorithm and under some mild technical conditions, we prove its convergence to a locally optimal policy. Experimentally, we demonstrate usefulness of regularization with the discounted future state distribution in terms of increased state space coverage and faster learning on a range of complex tasks. |
Tasks | Policy Gradient Methods |
Published | 2019-12-11 |
URL | https://arxiv.org/abs/1912.05104v1 |
https://arxiv.org/pdf/1912.05104v1.pdf | |
PWC | https://paperswithcode.com/paper/entropy-regularization-with-discounted-future |
Repo | |
Framework | |
Evolution of Robust High Speed Optical-Flow-Based Landing for Autonomous MAVs
Title | Evolution of Robust High Speed Optical-Flow-Based Landing for Autonomous MAVs |
Authors | Kirk Y. W. Scheper, Guido C. H. E. de Croon |
Abstract | Automatic optimization of robotic behavior has been the long-standing goal of Evolutionary Robotics. Allowing the problem at hand to be solved by automation often leads to novel approaches and new insights. A common problem encountered with this approach is that when this optimization occurs in a simulated environment, the optimized policies are subject to the reality gap when implemented in the real world. This often results in sub-optimal behavior, if it works at all. This paper investigates the automatic optimization of neurocontrollers to perform quick but safe landing maneuvers for a quadrotor micro air vehicle using the divergence of the optical flow field of a downward looking camera. The optimized policies showed that a piece-wise linear control scheme is more effective than the simple linear scheme commonly used, something not yet considered by human designers. Additionally, we show the utility in using abstraction on the input and output of the controller as a tool to improve the robustness of the optimized policies to the reality gap by testing our policies optimized in simulation on real world vehicles. We tested the neurocontrollers using two different methods to generate and process the visual input, one using a conventional CMOS camera and one a dynamic vision sensor, both of which perform significantly differently than the simulated sensor. The use of the abstracted input resulted in near seamless transfer to the real world with the controllers showing high robustness to a clear reality gap. |
Tasks | Optical Flow Estimation |
Published | 2019-12-16 |
URL | https://arxiv.org/abs/1912.07735v1 |
https://arxiv.org/pdf/1912.07735v1.pdf | |
PWC | https://paperswithcode.com/paper/evolution-of-robust-high-speed-optical-flow |
Repo | |
Framework | |
Hair Segmentation on Time-of-Flight RGBD Images
Title | Hair Segmentation on Time-of-Flight RGBD Images |
Authors | Yuanxi Ma, Cen Wang, Guli Zhang, Qilei Jiang, Shiying Li, Jingyi Yu |
Abstract | Robust segmentation of hair from portrait images remains challenging: hair does not conform to a uniform shape, style or even color; dark hair in particular lacks features. We present a novel computational imaging solution that tackles the problem from both input and processing fronts. We explore using Time-of-Flight (ToF) RGBD sensors on recent mobile devices. We first conduct a comprehensive analysis to show that scattering and inter-reflection cause different noise patterns on hair vs. non-hair regions on ToF images, by changing the light path and/or combining multiple paths. We then develop a deep network based approach that employs both ToF depth map and the RGB gradient maps to produce an initial hair segmentation with labeled hair components. We then refine the result by imposing ToF noise prior under the conditional random field. We collect the first ToF RGBD hair dataset with 20k+ head images captured on 30 human subjects with a variety of hairstyles at different view angles. Comprehensive experiments show that our approach outperforms the RGB based techniques in accuracy and robustness and can handle traditionally challenging cases such as dark hair, similar hair/background, similar hair/foreground, etc. |
Tasks | |
Published | 2019-03-07 |
URL | http://arxiv.org/abs/1903.02775v2 |
http://arxiv.org/pdf/1903.02775v2.pdf | |
PWC | https://paperswithcode.com/paper/hair-segmentation-on-time-of-flight-rgbd |
Repo | |
Framework | |
Bounds for the VC Dimension of 1NN Prototype Sets
Title | Bounds for the VC Dimension of 1NN Prototype Sets |
Authors | Iain A. D. Gunn, Ludmila I. Kuncheva |
Abstract | In Statistical Learning, the Vapnik-Chervonenkis (VC) dimension is an important combinatorial property of classifiers. To our knowledge, no theoretical results yet exist for the VC dimension of edited nearest-neighbour (1NN) classifiers with reference set of fixed size. Related theoretical results are scattered in the literature and their implications have not been made explicit. We collect some relevant results and use them to provide explicit lower and upper bounds for the VC dimension of 1NN classifiers with a prototype set of fixed size. We discuss the implications of these bounds for the size of training set needed to learn such a classifier to a given accuracy. Further, we provide a new lower bound for the two-dimensional case, based on a new geometrical argument. |
Tasks | |
Published | 2019-02-07 |
URL | http://arxiv.org/abs/1902.02660v1 |
http://arxiv.org/pdf/1902.02660v1.pdf | |
PWC | https://paperswithcode.com/paper/bounds-for-the-vc-dimension-of-1nn-prototype |
Repo | |
Framework | |
LSTM Language Models for LVCSR in First-Pass Decoding and Lattice-Rescoring
Title | LSTM Language Models for LVCSR in First-Pass Decoding and Lattice-Rescoring |
Authors | Eugen Beck, Wei Zhou, Ralf Schlüter, Hermann Ney |
Abstract | LSTM based language models are an important part of modern LVCSR systems as they significantly improve performance over traditional backoff language models. Incorporating them efficiently into decoding has been notoriously difficult. In this paper we present an approach based on a combination of one-pass decoding and lattice rescoring. We perform decoding with the LSTM-LM in the first pass but recombine hypothesis that share the last two words, afterwards we rescore the resulting lattice. We run our systems on GPGPU equipped machines and are able to produce competitive results on the Hub5’00 and Librispeech evaluation corpora with a runtime better than real-time. In addition we shortly investigate the possibility to carry out the full sum over all state-sequences belonging to a given word-hypothesis during decoding without recombination. |
Tasks | Large Vocabulary Continuous Speech Recognition |
Published | 2019-07-01 |
URL | https://arxiv.org/abs/1907.01030v1 |
https://arxiv.org/pdf/1907.01030v1.pdf | |
PWC | https://paperswithcode.com/paper/lstm-language-models-for-lvcsr-in-first-pass |
Repo | |
Framework | |
Fast Approximate Time-Delay Estimation in Ultrasound Elastography Using Principal Component Analysis
Title | Fast Approximate Time-Delay Estimation in Ultrasound Elastography Using Principal Component Analysis |
Authors | Abdelrahman Zayed, Hassan Rivaz |
Abstract | Time delay estimation (TDE) is a critical and challenging step in all ultrasound elastography methods. A growing number of TDE techniques require an approximate but robust and fast method to initialize solving for TDE. Herein, we present a fast method for calculating an approximate TDE between two radio frequency (RF) frames of ultrasound. Although this approximate TDE can be useful for several algorithms, we focus on GLobal Ultrasound Elastography (GLUE), which currently relies on Dynamic Programming (DP) to provide this approximate TDE. We exploit Principal Component Analysis (PCA) to find the general modes of deformation in quasi-static elastography, and therefore call our method PCA-GLUE. PCA-GLUE is a data-driven approach that learns a set of TDE principal components from a training database in real experiments. In the test phase, TDE is approximated as a weighted sum of these principal components. Our algorithm robustly estimates the weights from sparse feature matches, then passes the resulting displacement field to GLUE as initial estimates to perform a more accurate displacement estimation. PCA-GLUE is more than ten times faster than DP in estimation of the initial displacement field and yields similar results. |
Tasks | |
Published | 2019-11-13 |
URL | https://arxiv.org/abs/1911.05242v1 |
https://arxiv.org/pdf/1911.05242v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-approximate-time-delay-estimation-in |
Repo | |
Framework | |
Goal-oriented Object Importance Estimation in On-road Driving Videos
Title | Goal-oriented Object Importance Estimation in On-road Driving Videos |
Authors | Mingfei Gao, Ashish Tawari, Sujitha Martin |
Abstract | We formulate a new problem as Object Importance Estimation (OIE) in on-road driving videos, where the road users are considered as important objects if they have influence on the control decision of the ego-vehicle’s driver. The importance of a road user depends on both its visual dynamics, e.g., appearance, motion and location, in the driving scene and the driving goal, \emph{e.g}., the planned path, of the ego vehicle. We propose a novel framework that incorporates both visual model and goal representation to conduct OIE. To evaluate our framework, we collect an on-road driving dataset at traffic intersections in the real world and conduct human-labeled annotation of the important objects. Experimental results show that our goal-oriented method outperforms baselines and has much more improvement on the left-turn and right-turn scenarios. Furthermore, we explore the possibility of using object importance for driving control prediction and demonstrate that binary brake prediction can be improved with the information of object importance. |
Tasks | |
Published | 2019-05-08 |
URL | https://arxiv.org/abs/1905.02848v1 |
https://arxiv.org/pdf/1905.02848v1.pdf | |
PWC | https://paperswithcode.com/paper/goal-oriented-object-importance-estimation-in |
Repo | |
Framework | |
Grid-GCN for Fast and Scalable Point Cloud Learning
Title | Grid-GCN for Fast and Scalable Point Cloud Learning |
Authors | Qiangeng Xu, Xudong Sun, Cho-Ying Wu, Panqu Wang, Ulrich Neumann |
Abstract | Due to the sparsity and irregularity of the point cloud data, methods that directly consume points have become popular. Among all point-based models, graph convolutional networks (GCN) lead to notable performance by fully preserving the data granularity and exploiting point interrelation. However, point-based networks spend a significant amount of time on data structuring (e.g., Farthest Point Sampling (FPS) and neighbor points querying), which limit the speed and scalability. In this paper, we present a method, named Grid-GCN, for fast and scalable point cloud learning. Grid-GCN uses a novel data structuring strategy, Coverage-Aware Grid Query (CAGQ). By leveraging the efficiency of grid space, CAGQ improves spatial coverage while reducing the theoretical time complexity. Compared with popular sampling methods such as Farthest Point Sampling (FPS) and Ball Query, CAGQ achieves up to 50X speed-up. With a Grid Context Aggregation (GCA) module, Grid-GCN achieves state-of-the-art performance on major point cloud classification and segmentation benchmarks with significantly faster runtime than previous studies. Remarkably, Grid-GCN achieves the inference speed of 50fps on ScanNet using 81920 points per scene as input. |
Tasks | |
Published | 2019-12-06 |
URL | https://arxiv.org/abs/1912.02984v3 |
https://arxiv.org/pdf/1912.02984v3.pdf | |
PWC | https://paperswithcode.com/paper/grid-gcn-for-fast-and-scalable-point-cloud |
Repo | |
Framework | |
Enabling FDD Massive MIMO through Deep Learning-based Channel Prediction
Title | Enabling FDD Massive MIMO through Deep Learning-based Channel Prediction |
Authors | Maximilian Arnold, Sebastian Dörner, Sebastian Cammerer, Sarah Yan, Jakob Hoydis, Stephan ten Brink |
Abstract | A major obstacle for widespread deployment of frequency division duplex (FDD)-based Massive multiple-input multiple-output (MIMO) communications is the large signaling overhead for reporting full downlink (DL) channel state information (CSI) back to the basestation (BS), in order to enable closed-loop precoding. We completely remove this overhead by a deep-learning based channel extrapolation (or “prediction”) approach and demonstrate that a neural network (NN) at the BS can infer the DL CSI centered around a frequency $f_\text{DL}$ by solely observing uplink (UL) CSI on a different, yet adjacent frequency band around $f_\text{UL}$; no more pilot/reporting overhead is needed than with a genuine time division duplex (TDD)-based system. The rationale is that scatterers and the large-scale propagation environment are sufficiently similar to allow a NN to learn about the physical connections and constraints between two neighboring frequency bands, and thus provide a well-operating system even when classic extrapolation methods, like the Wiener filter (used as a baseline for comparison throughout) fails. We study its performance for various state-of-the-art Massive MIMO channel models, and, even more so, evaluate the scheme using actual Massive MIMO channel measurements, rendering it to be practically feasible at negligible loss in spectral efficiency when compared to a genuine TDD-based system. |
Tasks | |
Published | 2019-01-08 |
URL | http://arxiv.org/abs/1901.03664v1 |
http://arxiv.org/pdf/1901.03664v1.pdf | |
PWC | https://paperswithcode.com/paper/enabling-fdd-massive-mimo-through-deep |
Repo | |
Framework | |
A robot’s sense-making of fallacies and rhetorical tropes. Creating ontologies of what humans try to say
Title | A robot’s sense-making of fallacies and rhetorical tropes. Creating ontologies of what humans try to say |
Authors | Johan F. Hoorn, Denice J. Tuinhof |
Abstract | In the design of user-friendly robots, human communication should be understood by the system beyond mere logics and literal meaning. Robot communication-design has long ignored the importance of communication and politeness rules that are ‘forgiving’ and ‘suspending disbelief’ and cannot handle the basically metaphorical way humans design their utterances. Through analysis of the psychological causes of illogical and non-literal statements, signal detection, fundamental attribution errors, and anthropomorphism, we developed a fail-safe protocol for fallacies and tropes that makes use of Frege’s distinction between reference and sense, Beth’s tableau analytics, Grice’s maxim of quality, and epistemic considerations to have the robot politely make sense of a user’s sometimes unintelligible demands. Keywords: social robots, logical fallacies, metaphors, reference, sense, maxim of quality, tableau reasoning, epistemics of the virtual |
Tasks | |
Published | 2019-06-24 |
URL | https://arxiv.org/abs/1906.09689v1 |
https://arxiv.org/pdf/1906.09689v1.pdf | |
PWC | https://paperswithcode.com/paper/a-robots-sense-making-of-fallacies-and |
Repo | |
Framework | |
Measuring Sentences Similarity: A Survey
Title | Measuring Sentences Similarity: A Survey |
Authors | Mamdouh Farouk |
Abstract | This study is to review the approaches used for measuring sentences similarity. Measuring similarity between natural language sentences is a crucial task for many Natural Language Processing applications such as text classification, information retrieval, question answering, and plagiarism detection. This survey classifies approaches of calculating sentences similarity based on the adopted methodology into three categories. Word-to-word based, structure based, and vector-based are the most widely used approaches to find sentences similarity. Each approach measures relatedness between short texts based on a specific perspective. In addition, datasets that are mostly used as benchmarks for evaluating techniques in this field are introduced to provide a complete view on this issue. The approaches that combine more than one perspective give better results. Moreover, structure based similarity that measures similarity between sentences structures needs more investigation. |
Tasks | Information Retrieval, Question Answering, Text Classification |
Published | 2019-10-06 |
URL | https://arxiv.org/abs/1910.03940v1 |
https://arxiv.org/pdf/1910.03940v1.pdf | |
PWC | https://paperswithcode.com/paper/measuring-sentences-similarity-a-survey |
Repo | |
Framework | |
Proximal Policy Optimization for Improved Convergence in IRGAN
Title | Proximal Policy Optimization for Improved Convergence in IRGAN |
Authors | Moksh Jain, Sowmya Kamath S |
Abstract | IRGAN is an information retrieval (IR) modeling approach that uses a theoretical minimax game between a generative and a discriminative model to iteratively optimize both of them, hence unifying the generative and discriminative approaches. Despite significant performance improvements in several information retrieval tasks, IRGAN training is an unstable process, and the solution varies largely with the random parameter initialization. In this work, we present an improved training objective based on proximal policy optimization objective and Gumbel-Softmax based sampling for the generator. We also propose a modified training algorithm which takes a single gradient update on both the generator as well as discriminator for each iteration step. We present empirical evidence of the improved convergence of the proposed model over the original IRGAN and a comparison on three different IR tasks on benchmark datasets is also discussed, emphasizing the proposed model’s superior performance. |
Tasks | Information Retrieval |
Published | 2019-10-01 |
URL | https://arxiv.org/abs/1910.00352v1 |
https://arxiv.org/pdf/1910.00352v1.pdf | |
PWC | https://paperswithcode.com/paper/proximal-policy-optimization-for-improved |
Repo | |
Framework | |
Significance-aware Information Bottleneck for Domain Adaptive Semantic Segmentation
Title | Significance-aware Information Bottleneck for Domain Adaptive Semantic Segmentation |
Authors | Yawei Luo, Ping Liu, Tao Guan, Junqing Yu, Yi Yang |
Abstract | For unsupervised domain adaptation problems, the strategy of aligning the two domains in latent feature space through adversarial learning has achieved much progress in image classification, but usually fails in semantic segmentation tasks in which the latent representations are overcomplex. In this work, we equip the adversarial network with a “significance-aware information bottleneck (SIB)", to address the above problem. The new network structure, called SIBAN, enables a significance-aware feature purification before the adversarial adaptation, which eases the feature alignment and stabilizes the adversarial training course. In two domain adaptation tasks, i.e., GTA5 -> Cityscapes and SYNTHIA -> Cityscapes, we validate that the proposed method can yield leading results compared with other feature-space alternatives. Moreover, SIBAN can even match the state-of-the-art output-space methods in segmentation accuracy, while the latter are often considered to be better choices for domain adaptive segmentation task. |
Tasks | Domain Adaptation, Image Classification, Semantic Segmentation, Unsupervised Domain Adaptation |
Published | 2019-04-01 |
URL | http://arxiv.org/abs/1904.00876v1 |
http://arxiv.org/pdf/1904.00876v1.pdf | |
PWC | https://paperswithcode.com/paper/significance-aware-information-bottleneck-for |
Repo | |
Framework | |
Optimality of the Subgradient Algorithm in the Stochastic Setting
Title | Optimality of the Subgradient Algorithm in the Stochastic Setting |
Authors | Daron Anderson, Douglas Leith |
Abstract | Recently Jaouad Mourtada and St' ephane Ga"iffas showed the anytime hedge algorithm has pseudo-regret $O(\log (d) / \Delta)$ if the cost vectors are generated by an i.i.d sequence in the cube $[0,1]^d$. Here $d$ is the dimension and $\Delta$ the suboptimality gap. This is remarkable because the Hedge algorithm was designed for the antagonistic setting. We prove a similar result for the anytime subgradient algorithm on the simplex. Given i.i.d cost vectors in the unit ball our pseudo-regret bound is $O(1/\Delta)$ and does not depend on the dimension of the problem. |
Tasks | |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.05007v5 |
https://arxiv.org/pdf/1909.05007v5.pdf | |
PWC | https://paperswithcode.com/paper/optimality-of-the-subgradient-algorithm-in |
Repo | |
Framework | |