Paper Group ANR 1514
Unsupervised Neural Sensor Models for Synthetic LiDAR Data Augmentation. Differentiable Game Mechanics. The emotions that we perceive in music: the influence of language and lyrics comprehension on agreement. Local Embeddings for Relational Data Integration. Face morphing detection in the presence of printing/scanning and heterogeneous image source …
Unsupervised Neural Sensor Models for Synthetic LiDAR Data Augmentation
Title | Unsupervised Neural Sensor Models for Synthetic LiDAR Data Augmentation |
Authors | Ahmad El Sallab, Ibrahim Sobh, Mohamed Zahran, Mohamed Shawky |
Abstract | Data scarcity is a bottleneck to machine learning-based perception modules, usually tackled by augmenting real data with synthetic data from simulators. Realistic models of the vehicle perception sensors are hard to formulate in closed form, and at the same time, they require the existence of paired data to be learned. In this work, we propose two unsupervised neural sensor models based on unpaired domain translations with CycleGANs and Neural Style Transfer techniques. We employ CARLA as the simulation environment to obtain simulated LiDAR point clouds, together with their annotations for data augmentation, and we use KITTI dataset as the real LiDAR dataset from which we learn the realistic sensor model mapping. Moreover, we provide a framework for data augmentation and evaluation of the developed sensor models, through extrinsic object detection task evaluation using YOLO network adapted to provide oriented bounding boxes for LiDAR Bird-eye-View projected point clouds. Evaluation is performed on unseen real LiDAR frames from KITTI dataset, with different amounts of simulated data augmentation using the two proposed approaches, showing improvement of 6% mAP for the object detection task, in favor of the augmenting LiDAR point clouds adapted with the proposed neural sensor models over the raw simulated LiDAR. |
Tasks | Data Augmentation, Object Detection, Style Transfer |
Published | 2019-11-24 |
URL | https://arxiv.org/abs/1911.10575v1 |
https://arxiv.org/pdf/1911.10575v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-neural-sensor-models-for |
Repo | |
Framework | |
Differentiable Game Mechanics
Title | Differentiable Game Mechanics |
Authors | Alistair Letcher, David Balduzzi, Sebastien Racaniere, James Martens, Jakob Foerster, Karl Tuyls, Thore Graepel |
Abstract | Deep learning is built on the foundational guarantee that gradient descent on an objective function converges to local minima. Unfortunately, this guarantee fails in settings, such as generative adversarial nets, that exhibit multiple interacting losses. The behavior of gradient-based methods in games is not well understood – and is becoming increasingly important as adversarial and multi-objective architectures proliferate. In this paper, we develop new tools to understand and control the dynamics in n-player differentiable games. The key result is to decompose the game Jacobian into two components. The first, symmetric component, is related to potential games, which reduce to gradient descent on an implicit function. The second, antisymmetric component, relates to Hamiltonian games, a new class of games that obey a conservation law akin to conservation laws in classical mechanical systems. The decomposition motivates Symplectic Gradient Adjustment (SGA), a new algorithm for finding stable fixed points in differentiable games. Basic experiments show SGA is competitive with recently proposed algorithms for finding stable fixed points in GANs – while at the same time being applicable to, and having guarantees in, much more general cases. |
Tasks | |
Published | 2019-05-13 |
URL | https://arxiv.org/abs/1905.04926v1 |
https://arxiv.org/pdf/1905.04926v1.pdf | |
PWC | https://paperswithcode.com/paper/differentiable-game-mechanics |
Repo | |
Framework | |
The emotions that we perceive in music: the influence of language and lyrics comprehension on agreement
Title | The emotions that we perceive in music: the influence of language and lyrics comprehension on agreement |
Authors | Juan Sebastián Gómez Cañón, Perfecto Herrera, Emilia Gómez, Estefanía Cano |
Abstract | In the present study, we address the relationship between the emotions perceived in pop and rock music (mainly in Euro-American styles with English lyrics) and the language spoken by the listener. Our goal is to understand the influence of lyrics comprehension on the perception of emotions and use this information to improve Music Emotion Recognition (MER) models. Two main research questions are addressed: 1. Are there differences and similarities between the emotions perceived in pop/rock music by listeners raised with different mother tongues? 2. Do personal characteristics have an influence on the perceived emotions for listeners of a given language? Personal characteristics include the listeners’ general demographics, familiarity and preference for the fragments, and music sophistication. Our hypothesis is that inter-rater agreement (as defined by Krippendorff’s alpha coefficient) from subjects is directly influenced by the comprehension of lyrics. |
Tasks | Emotion Recognition, Music Emotion Recognition |
Published | 2019-09-12 |
URL | https://arxiv.org/abs/1909.05882v2 |
https://arxiv.org/pdf/1909.05882v2.pdf | |
PWC | https://paperswithcode.com/paper/the-emotions-that-we-perceive-in-music-the |
Repo | |
Framework | |
Local Embeddings for Relational Data Integration
Title | Local Embeddings for Relational Data Integration |
Authors | Riccardo Cappuzzo, Paolo Papotti, Saravanan Thirumuruganathan |
Abstract | Integrating information from heterogeneous data sources is one of the fundamental problems facing any enterprise. Recently, it has been shown that deep learning based techniques such as embeddings are a promising approach for data integration problems. Prior efforts directly use pre-trained embeddings or simplistically adapt techniques from natural language processing to obtain relational embeddings. In this work, we propose algorithms for obtaining local embeddings that are effective for data integration tasks on relational data. We make three major contributions. First, we describe a compact graph-based representation that allows the specification of a rich set of relationships inherent in relational world. Second, we propose how to derive sentences from such graph that effectively describe the similarity across elements (tokens, attributes, rows) across the two datasets. The embeddings are learned based on such sentences. Finally, we propose a diverse collection of criteria to evaluate relational embeddings and perform extensive set of experiments validating them. Our experiments show that our system, EmbDI, produces meaningful results for data integration tasks and our embeddings improve the result quality for existing state of the art methods. |
Tasks | |
Published | 2019-09-03 |
URL | https://arxiv.org/abs/1909.01120v1 |
https://arxiv.org/pdf/1909.01120v1.pdf | |
PWC | https://paperswithcode.com/paper/local-embeddings-for-relational-data |
Repo | |
Framework | |
Face morphing detection in the presence of printing/scanning and heterogeneous image sources
Title | Face morphing detection in the presence of printing/scanning and heterogeneous image sources |
Authors | Matteo Ferrara, Annalisa Franco, Davide Maltoni |
Abstract | Face morphing represents nowadays a big security threat in the context of electronic identity documents as well as an interesting challenge for researchers in the field of face recognition. Despite of the good performance obtained by state-of-the-art approaches on digital images, no satisfactory solutions have been identified so far to deal with cross-database testing and printed-scanned images (typically used in many countries for document issuing). In this work, novel approaches are proposed to train Deep Neural Networks for morphing detection: in particular generation of simulated printed-scanned images together with other data augmentation strategies and pre-training on large face recognition datasets, allowed to reach state-of-the-art accuracy on challenging datasets from heterogeneous image sources. |
Tasks | Data Augmentation, Face Recognition |
Published | 2019-01-25 |
URL | http://arxiv.org/abs/1901.08811v1 |
http://arxiv.org/pdf/1901.08811v1.pdf | |
PWC | https://paperswithcode.com/paper/face-morphing-detection-in-the-presence-of |
Repo | |
Framework | |
Deducing Kurdyka-Łojasiewicz exponent via inf-projection
Title | Deducing Kurdyka-Łojasiewicz exponent via inf-projection |
Authors | Peiran Yu, Guoyin Li, Ting Kei Pong |
Abstract | Kurdyka-{\L}ojasiewicz (KL) exponent plays an important role in estimating the convergence rate of many contemporary first-order methods. In particular, a KL exponent of $\frac12$ is related to local linear convergence. Nevertheless, KL exponent is in general extremely hard to estimate. In this paper, we show under mild assumptions that KL exponent is preserved via inf-projection. Inf-projection is a fundamental operation that is ubiquitous when reformulating optimization problems via the lift-and-project approach. By studying its operation on KL exponent, we show that the KL exponent is $\frac12$ for several important convex optimization models, including some semidefinite-programming-representable functions and functions that involve $C^2$-cone reducible structures, under conditions such as strict complementarity. Our results are applicable to concrete optimization models such as group fused Lasso and overlapping group Lasso. In addition, for nonconvex models, we show that the KL exponent of many difference-of-convex functions can be derived from that of their natural majorant functions, and the KL exponent of the Bregman envelope of a function is the same as that of the function itself. Finally, we estimate the KL exponent of the sum of the least squares function and the indicator function of the set of matrices of rank at most $k$. |
Tasks | |
Published | 2019-02-10 |
URL | http://arxiv.org/abs/1902.03635v1 |
http://arxiv.org/pdf/1902.03635v1.pdf | |
PWC | https://paperswithcode.com/paper/deducing-kurdyka-ojasiewicz-exponent-via-inf |
Repo | |
Framework | |
Optimal Machine Intelligence Near the Edge of Chaos
Title | Optimal Machine Intelligence Near the Edge of Chaos |
Authors | Ling Feng, Choy Heng Lai |
Abstract | It has long been suggested that living systems, in particular the brain, may operate near some critical point. How about machines? Through dynamical stability analysis on various computer vision models, we find direct evidence that optimal deep neural network performance occur near the transition point separating stable and chaotic attractors. In fact modern neural network architectures push the model closer to this edge of chaos during the training process. Our dissection into their fully connected layers reveals that they achieve the stability transition through self-adjusting an oscillation-diffusion process embedded in the weights. Further analogy to the logistic map leads us to believe that the optimality near the edge of chaos is a consequence of maximal diversity of stable states, which maximize the effective expressivity. |
Tasks | |
Published | 2019-09-11 |
URL | https://arxiv.org/abs/1909.05176v1 |
https://arxiv.org/pdf/1909.05176v1.pdf | |
PWC | https://paperswithcode.com/paper/optimal-machine-intelligence-near-the-edge-of |
Repo | |
Framework | |
Towards Efficient Data Valuation Based on the Shapley Value
Title | Towards Efficient Data Valuation Based on the Shapley Value |
Authors | Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gurel, Bo Li, Ce Zhang, Dawn Song, Costas Spanos |
Abstract | “How much is my data worth?” is an increasingly common question posed by organizations and individuals alike. An answer to this question could allow, for instance, fairly distributing profits among multiple data contributors and determining prospective compensation when data breaches happen. In this paper, we study the problem of data valuation by utilizing the Shapley value, a popular notion of value which originated in coopoerative game theory. The Shapley value defines a unique payoff scheme that satisfies many desiderata for the notion of data value. However, the Shapley value often requires exponential time to compute. To meet this challenge, we propose a repertoire of efficient algorithms for approximating the Shapley value. We also demonstrate the value of each training instance for various benchmark datasets. |
Tasks | |
Published | 2019-02-27 |
URL | https://arxiv.org/abs/1902.10275v2 |
https://arxiv.org/pdf/1902.10275v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-efficient-data-valuation-based-on-the |
Repo | |
Framework | |
Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning
Title | Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning |
Authors | Casey Chu, Jose Blanchet, Peter Glynn |
Abstract | This paper provides a unifying view of a wide range of problems of interest in machine learning by framing them as the minimization of functionals defined on the space of probability measures. In particular, we show that generative adversarial networks, variational inference, and actor-critic methods in reinforcement learning can all be seen through the lens of our framework. We then discuss a generic optimization algorithm for our formulation, called probability functional descent (PFD), and show how this algorithm recovers existing methods developed independently in the settings mentioned earlier. |
Tasks | |
Published | 2019-01-30 |
URL | https://arxiv.org/abs/1901.10691v2 |
https://arxiv.org/pdf/1901.10691v2.pdf | |
PWC | https://paperswithcode.com/paper/probability-functional-descent-a-unifying |
Repo | |
Framework | |
Global Hashing System for Fast Image Search
Title | Global Hashing System for Fast Image Search |
Authors | Dayong Tian, Dacheng Tao |
Abstract | Hashing methods have been widely investigated for fast approximate nearest neighbor searching in large data sets. Most existing methods use binary vectors in lower dimensional spaces to represent data points that are usually real vectors of higher dimensionality. We divide the hashing process into two steps. Data points are first embedded in a low-dimensional space, and the global positioning system method is subsequently introduced but modified for binary embedding. We devise dataindependent and data-dependent methods to distribute the satellites at appropriate locations. Our methods are based on finding the tradeoff between the information losses in these two steps. Experiments show that our data-dependent method outperforms other methods in different-sized data sets from 100k to 10M. By incorporating the orthogonality of the code matrix, both our data-independent and data-dependent methods are particularly impressive in experiments on longer bits. |
Tasks | Image Retrieval |
Published | 2019-04-18 |
URL | http://arxiv.org/abs/1904.08685v1 |
http://arxiv.org/pdf/1904.08685v1.pdf | |
PWC | https://paperswithcode.com/paper/global-hashing-system-for-fast-image-search |
Repo | |
Framework | |
Sparse Deep Neural Network Graph Challenge
Title | Sparse Deep Neural Network Graph Challenge |
Authors | Jeremy Kepner, Simon Alford, Vijay Gadepally, Michael Jones, Lauren Milechin, Ryan Robinett, Sid Samsi |
Abstract | The MIT/IEEE/Amazon GraphChallenge.org encourages community approaches to developing new solutions for analyzing graphs and sparse data. Sparse AI analytics present unique scalability difficulties. The proposed Sparse Deep Neural Network (DNN) Challenge draws upon prior challenges from machine learning, high performance computing, and visual analytics to create a challenge that is reflective of emerging sparse AI systems. The Sparse DNN Challenge is based on a mathematically well-defined DNN inference computation and can be implemented in any programming environment. Sparse DNN inference is amenable to both vertex-centric implementations and array-based implementations (e.g., using the GraphBLAS.org standard). The computations are simple enough that performance predictions can be made based on simple computing hardware models. The input data sets are derived from the MNIST handwritten letters. The surrounding I/O and verification provide the context for each sparse DNN inference that allows rigorous definition of both the input and the output. Furthermore, since the proposed sparse DNN challenge is scalable in both problem size and hardware, it can be used to measure and quantitatively compare a wide range of present day and future systems. Reference implementations have been implemented and their serial and parallel performance have been measured. Specifications, data, and software are publicly available at GraphChallenge.org |
Tasks | |
Published | 2019-09-02 |
URL | https://arxiv.org/abs/1909.05631v1 |
https://arxiv.org/pdf/1909.05631v1.pdf | |
PWC | https://paperswithcode.com/paper/sparse-deep-neural-network-graph-challenge |
Repo | |
Framework | |
A Fourier Disparity Layer representation for Light Fields
Title | A Fourier Disparity Layer representation for Light Fields |
Authors | Mikael Le Pendu, Christine Guillemot, Aljosa Smolic |
Abstract | In this paper, we present a new Light Field representation for efficient Light Field processing and rendering called Fourier Disparity Layers (FDL). The proposed FDL representation samples the Light Field in the depth (or equivalently the disparity) dimension by decomposing the scene as a discrete sum of layers. The layers can be constructed from various types of Light Field inputs including a set of sub-aperture images, a focal stack, or even a combination of both. From our derivations in the Fourier domain, the layers are simply obtained by a regularized least square regression performed independently at each spatial frequency, which is efficiently parallelized in a GPU implementation. Our model is also used to derive a gradient descent based calibration step that estimates the input view positions and an optimal set of disparity values required for the layer construction. Once the layers are known, they can be simply shifted and filtered to produce different viewpoints of the scene while controlling the focus and simulating a camera aperture of arbitrary shape and size. Our implementation in the Fourier domain allows real time Light Field rendering. Finally, direct applications such as view interpolation or extrapolation and denoising are presented and evaluated. |
Tasks | Calibration, Denoising |
Published | 2019-01-21 |
URL | http://arxiv.org/abs/1901.06919v1 |
http://arxiv.org/pdf/1901.06919v1.pdf | |
PWC | https://paperswithcode.com/paper/a-fourier-disparity-layer-representation-for |
Repo | |
Framework | |
MASTER: Multi-Aspect Non-local Network for Scene Text Recognition
Title | MASTER: Multi-Aspect Non-local Network for Scene Text Recognition |
Authors | Ning Lu, Wenwen Yu, Xianbiao Qi, Yihao Chen, Ping Gong, Rong Xiao |
Abstract | Attention based scene text recognizers have gained huge success, which leverage a more compact intermediate representations to learn 1d- or 2d- attention by a RNN-based encoder-decoder architecture. However, such methods suffer from attention-drift problem because high similarity among encoded features lead to attention confusion under the RNN-based local attention mechanism. Moreover RNN-based methods have low efficiency due to poor parallelization. To overcome these problems, we propose the MASTER, a self-attention based scene text recognizer that (1) not only encodes the input-output attention, but also learns self-attention which encodes feature-feature and target-target relationships inside the encoder and decoder and (2) learns a more powerful and robust intermediate representation to spatial distortion and (3) owns a better training and evaluation efficiency. Extensive experiments on various benchmarks demonstrate the superior performance of our MASTER on both regular and irregular scene text. |
Tasks | Scene Text Recognition |
Published | 2019-10-07 |
URL | https://arxiv.org/abs/1910.02562v1 |
https://arxiv.org/pdf/1910.02562v1.pdf | |
PWC | https://paperswithcode.com/paper/master-multi-aspect-non-local-network-for |
Repo | |
Framework | |
SSFN – Self Size-estimating Feed-forward Network with Low Complexity, Limited Need for Human Intervention, and Consistent Behaviour across Trials
Title | SSFN – Self Size-estimating Feed-forward Network with Low Complexity, Limited Need for Human Intervention, and Consistent Behaviour across Trials |
Authors | Saikat Chatterjee, Alireza M. Javid, Mostafa Sadeghi, Shumpei Kikuta, Dong Liu, Partha P. Mitra, Mikael Skoglund |
Abstract | We design a self size-estimating feed-forward network (SSFN) using a joint optimization approach for estimation of number of layers, number of nodes and learning of weight matrices. The learning algorithm has a low computational complexity, preferably within few minutes using a laptop. In addition the algorithm has a limited need for human intervention to tune parameters. SSFN grows from a small-size network to a large-size network, guaranteeing a monotonically non-increasing cost with addition of nodes and layers. The learning approach uses judicious a combination of `lossless flow property’ of some activation functions, convex optimization and instance of random matrix. Consistent performance – low variation across Monte-Carlo trials – is found for inference performance (classification accuracy) and estimation of network size. | |
Tasks | Image Classification |
Published | 2019-05-17 |
URL | https://arxiv.org/abs/1905.07111v2 |
https://arxiv.org/pdf/1905.07111v2.pdf | |
PWC | https://paperswithcode.com/paper/ssfn-self-size-estimating-feed-forward |
Repo | |
Framework | |
FireNet: Real-time Segmentation of Fire Perimeter from Aerial Video
Title | FireNet: Real-time Segmentation of Fire Perimeter from Aerial Video |
Authors | Jigar Doshi, Dominic Garcia, Cliff Massey, Pablo Llueca, Nicolas Borensztein, Michael Baird, Matthew Cook, Devaki Raj |
Abstract | In this paper, we share our approach to real-time segmentation of fire perimeter from aerial full-motion infrared video. We start by describing the problem from a humanitarian aid and disaster response perspective. Specifically, we explain the importance of the problem, how it is currently resolved, and how our machine learning approach improves it. To test our models we annotate a large-scale dataset of 400,000 frames with guidance from domain experts. Finally, we share our approach currently deployed in production with inference speed of 20 frames per second and an accuracy of 92 (F1 Score). |
Tasks | |
Published | 2019-10-14 |
URL | https://arxiv.org/abs/1910.06407v1 |
https://arxiv.org/pdf/1910.06407v1.pdf | |
PWC | https://paperswithcode.com/paper/firenet-real-time-segmentation-of-fire |
Repo | |
Framework | |