Paper Group ANR 707
Relational Action Forecasting. Cascaded Gaussian Processes for Data-efficient Robot Dynamics Learning. Coherent and Controllable Outfit Generation. Near-Convex Archetypal Analysis. Universal Pooling – A New Pooling Method for Convolutional Neural Networks. MASS-UMAP: Fast and accurate analog ensemble search in weather radar archive. A comparative …
Relational Action Forecasting
Title | Relational Action Forecasting |
Authors | Chen Sun, Abhinav Shrivastava, Carl Vondrick, Rahul Sukthankar, Kevin Murphy, Cordelia Schmid |
Abstract | This paper focuses on multi-person action forecasting in videos. More precisely, given a history of H previous frames, the goal is to detect actors and to predict their future actions for the next T frames. Our approach jointly models temporal and spatial interactions among different actors by constructing a recurrent graph, using actor proposals obtained with Faster R-CNN as nodes. Our method learns to select a subset of discriminative relations without requiring explicit supervision, thus enabling us to tackle challenging visual data. We refer to our model as Discriminative Relational Recurrent Network (DRRN). Evaluation of action prediction on AVA demonstrates the effectiveness of our proposed method compared to simpler baselines. Furthermore, we significantly improve performance on the task of early action classification on J-HMDB, from the previous SOTA of 48% to 60%. |
Tasks | Action Classification, Action Recognition In Videos |
Published | 2019-04-08 |
URL | http://arxiv.org/abs/1904.04231v1 |
http://arxiv.org/pdf/1904.04231v1.pdf | |
PWC | https://paperswithcode.com/paper/relational-action-forecasting |
Repo | |
Framework | |
Cascaded Gaussian Processes for Data-efficient Robot Dynamics Learning
Title | Cascaded Gaussian Processes for Data-efficient Robot Dynamics Learning |
Authors | Sahand Rezaei-Shoshtari, David Meger, Inna Sharf |
Abstract | Motivated by the recursive Newton-Euler formulation, we propose a novel cascaded Gaussian process learning framework for the inverse dynamics of robot manipulators. This approach leads to a significant dimensionality reduction which in turn results in better learning and data efficiency. We explore two formulations for the cascading: the inward and outward, both along the manipulator chain topology. The learned modeling is tested in conjunction with the classical inverse dynamics model (semi-parametric) and on its own (non-parametric) in the context of feed-forward control of the arm. Experimental results are obtained with Jaco 2 six-DOF and SARCOS seven-DOF manipulators for randomly defined sinusoidal motions of the joints in order to evaluate the performance of cascading against the standard GP learning. In addition, experiments are conducted using Jaco 2 on a task emulating a pouring maneuver. Results indicate a consistent improvement in learning speed with the inward cascaded GP model and an overall improvement in data efficiency and generalization. |
Tasks | Dimensionality Reduction, Gaussian Processes |
Published | 2019-10-05 |
URL | https://arxiv.org/abs/1910.02291v1 |
https://arxiv.org/pdf/1910.02291v1.pdf | |
PWC | https://paperswithcode.com/paper/cascaded-gaussian-processes-for-data |
Repo | |
Framework | |
Coherent and Controllable Outfit Generation
Title | Coherent and Controllable Outfit Generation |
Authors | Kedan Li, Chen Liu, David Forsyth |
Abstract | When thinking about dressing oneself, people often have a theme in mind whether they’re going to a tropical getaway or wish to appear attractive at a cocktail party. A useful outfit generation system should come up with clothing items that are compatible while matching a theme specified by the user. Existing methods use item-wise compatibility between products but lack an effective way to enforce a global constraint (e.g., style, occasion). We introduce a method that generates outfits whose items match a theme described by a text query. Our method uses text and image embeddings to represent fashion items. We learn a multimodal embedding where the image representation for an item is close to its text representation, and use this embedding to measure item-query coherence. We then use a discriminator to compute compatibility between fashion items. This strategy yields a compatibility prediction method that meets or exceeds the state of the art. Our method combines item-item compatibility and item-query coherence to construct an outfit whose items are (a) close to the query and (b) compatible with one another. Quantitative evaluation shows that the items in our outfits are tightly clustered compared to standard outfits. Furthermore, outfits produced by similar queries are close to one another, and outfits produced by very different queries are far apart. Qualitative evaluation shows that our method responds well to queries. A user study suggests that people understand the match between the queries and the outfits produced by our method. |
Tasks | |
Published | 2019-06-17 |
URL | https://arxiv.org/abs/1906.07273v2 |
https://arxiv.org/pdf/1906.07273v2.pdf | |
PWC | https://paperswithcode.com/paper/using-discriminative-methods-to-learn-fashion |
Repo | |
Framework | |
Near-Convex Archetypal Analysis
Title | Near-Convex Archetypal Analysis |
Authors | Pierre De Handschutter, Nicolas Gillis, Arnaud Vandaele, Xavier Siebert |
Abstract | Nonnegative matrix factorization (NMF) is a widely used linear dimensionality reduction technique for nonnegative data. NMF requires that each data point is approximated by a convex combination of basis elements. Archetypal analysis (AA), also referred to as convex NMF, is a well-known NMF variant imposing that the basis elements are themselves convex combinations of the data points. AA has the advantage to be more interpretable than NMF because the basis elements are directly constructed from the data points. However, it usually suffers from a high data fitting error because the basis elements are constrained to be contained in the convex cone of the data points. In this letter, we introduce near-convex archetypal analysis (NCAA) which combines the advantages of both AA and NMF. As for AA, the basis vectors are required to be linear combinations of the data points and hence are easily interpretable. As for NMF, the additional flexibility in choosing the basis elements allows NCAA to have a low data fitting error. We show that NCAA compares favorably with a state-of-the-art minimum-volume NMF method on synthetic datasets and on a real-world hyperspectral image. |
Tasks | Dimensionality Reduction |
Published | 2019-10-02 |
URL | https://arxiv.org/abs/1910.00821v1 |
https://arxiv.org/pdf/1910.00821v1.pdf | |
PWC | https://paperswithcode.com/paper/near-convex-archetypal-analysis |
Repo | |
Framework | |
Universal Pooling – A New Pooling Method for Convolutional Neural Networks
Title | Universal Pooling – A New Pooling Method for Convolutional Neural Networks |
Authors | Junhyuk Hyun, Hongje Seong, Euntai Kim |
Abstract | Pooling is one of the main elements in convolutional neural networks. The pooling reduces the size of the feature map, enabling training and testing with a limited amount of computation. This paper proposes a new pooling method named universal pooling. Unlike the existing pooling methods such as average pooling, max pooling, and stride pooling with fixed pooling function, universal pooling generates any pooling function, depending on a given problem and dataset. Universal pooling was inspired by attention methods and can be considered as a channel-wise form of local spatial attention. Universal pooling is trained jointly with the main network and it is shown that it includes the existing pooling methods. Finally, when applied to two benchmark problems, the proposed method outperformed the existing pooling methods and performed with the expected diversity, adapting to the given problem. |
Tasks | |
Published | 2019-07-26 |
URL | https://arxiv.org/abs/1907.11440v1 |
https://arxiv.org/pdf/1907.11440v1.pdf | |
PWC | https://paperswithcode.com/paper/universal-pooling-a-new-pooling-method-for |
Repo | |
Framework | |
MASS-UMAP: Fast and accurate analog ensemble search in weather radar archive
Title | MASS-UMAP: Fast and accurate analog ensemble search in weather radar archive |
Authors | Gabriele Franch, Giuseppe Jurman, Luca Coviello, Marta Pendesini, Cesare Furlanello |
Abstract | The use of analogs - similar weather patterns - for weather forecasting and analysis is an established method in meteorology. The most challenging aspect of using this approach in the context of operational radar applications is to be able to perform a fast and accurate search for similar spatiotemporal precipitation patterns in a large archive of historical records. In this context, sequential pairwise search is too slow and computationally expensive. Here we propose an architecture to significantly speed-up spatiotemporal analog retrieval by combining nonlinear geometric dimensionality reduction (UMAP) with the fastest known Euclidean search algorithm for time series (MASS) to find radar analogs in constant time, independently of the desired temporal length to match and the number of extracted analogs. We compare UMAP with Principal component analysis (PCA) and show that UMAP outperforms PCA for spatial MSE analog search with proper settings. Moreover, we show that MASS is 20 times faster than brute force search on the UMAP embeddings space. We test the architecture on a real dataset and show that it enables precise and fast operational analog ensemble search through more than 2 years of radar archive in less than 5 seconds on a single workstation. |
Tasks | Dimensionality Reduction, Time Series, Weather Forecasting |
Published | 2019-10-01 |
URL | https://arxiv.org/abs/1910.01211v1 |
https://arxiv.org/pdf/1910.01211v1.pdf | |
PWC | https://paperswithcode.com/paper/mass-umap-fast-and-accurate-analog-ensemble |
Repo | |
Framework | |
A comparative study of general fuzzy min-max neural networks for pattern classification problems
Title | A comparative study of general fuzzy min-max neural networks for pattern classification problems |
Authors | Thanh Tung Khuat, Bogdan Gabrys |
Abstract | General fuzzy min-max (GFMM) neural network is a generalization of fuzzy neural networks formed by hyperbox fuzzy sets for classification and clustering problems. Two principle algorithms are deployed to train this type of neural network, i.e., incremental learning and agglomerative learning. This paper presents a comprehensive empirical study of performance influencing factors, advantages, and drawbacks of the general fuzzy min-max neural network on pattern classification problems. The subjects of this study include (1) the impact of maximum hyperbox size, (2) the influence of the similarity threshold and measures on the agglomerative learning algorithm, (3) the effect of data presentation order, (4) comparative performance evaluation of the GFMM with other types of fuzzy min-max neural networks and prevalent machine learning algorithms. The experimental results on benchmark datasets widely used in machine learning showed overall strong and weak points of the GFMM classifier. These outcomes also informed potential research directions for this class of machine learning algorithms in the future. |
Tasks | |
Published | 2019-07-31 |
URL | https://arxiv.org/abs/1907.13308v2 |
https://arxiv.org/pdf/1907.13308v2.pdf | |
PWC | https://paperswithcode.com/paper/a-comparative-study-of-general-fuzzy-min-max |
Repo | |
Framework | |
Conclusion-Supplement Answer Generation for Non-Factoid Questions
Title | Conclusion-Supplement Answer Generation for Non-Factoid Questions |
Authors | Makoto Nakatsuji, Sohei Okui |
Abstract | This paper tackles the goal of conclusion-supplement answer generation for non-factoid questions, which is a critical issue in the field of Natural Language Processing (NLP) and Artificial Intelligence (AI), as users often require supplementary information before accepting a conclusion. The current encoder-decoder framework, however, has difficulty generating such answers, since it may become confused when it tries to learn several different long answers to the same non-factoid question. Our solution, called an ensemble network, goes beyond single short sentences and fuses logically connected conclusion statements and supplementary statements. It extracts the context from the conclusion decoder’s output sequence and uses it to create supplementary decoder states on the basis of an attention mechanism. It also assesses the closeness of the question encoder’s output sequence and the separate outputs of the conclusion and supplement decoders as well as their combination. As a result, it generates answers that match the questions and have natural-sounding supplementary sequences in line with the context expressed by the conclusion sequence. Evaluations conducted on datasets including “Love Advice” and “Arts & Humanities” categories indicate that our model outputs much more accurate results than the tested baseline models do. |
Tasks | |
Published | 2019-11-25 |
URL | https://arxiv.org/abs/1912.00864v1 |
https://arxiv.org/pdf/1912.00864v1.pdf | |
PWC | https://paperswithcode.com/paper/conclusion-supplement-answer-generation-for |
Repo | |
Framework | |
Task-Based Learning
Title | Task-Based Learning |
Authors | Di Chen, Yada Zhu, Xiaodong Cui, Carla P. Gomes |
Abstract | This paper talks about task-based learning. |
Tasks | Decision Making |
Published | 2019-10-17 |
URL | https://arxiv.org/abs/1910.09357v3 |
https://arxiv.org/pdf/1910.09357v3.pdf | |
PWC | https://paperswithcode.com/paper/task-based-learning-via-task-oriented |
Repo | |
Framework | |
Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders
Title | Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders |
Authors | Yin-Jyun Luo, Kat Agres, Dorien Herremans |
Abstract | In this paper, we learn disentangled representations of timbre and pitch for musical instrument sounds. We adapt a framework based on variational autoencoders with Gaussian mixture latent distributions. Specifically, we use two separate encoders to learn distinct latent spaces for timbre and pitch, which form Gaussian mixture components representing instrument identity and pitch, respectively. For reconstruction, latent variables of timbre and pitch are sampled from corresponding mixture components, and are concatenated as the input to a decoder. We show the model efficacy by latent space visualization, and a quantitative analysis indicates the discriminability of these spaces, even with a limited number of instrument labels for training. The model allows for controllable synthesis of selected instrument sounds by sampling from the latent spaces. To evaluate this, we trained instrument and pitch classifiers using original labeled data. These classifiers achieve high accuracy when tested on our synthesized sounds, which verifies the model performance of controllable realistic timbre and pitch synthesis. Our model also enables timbre transfer between multiple instruments, with a single autoencoder architecture, which is evaluated by measuring the shift in posterior of instrument classification. Our in depth evaluation confirms the model ability to successfully disentangle timbre and pitch. |
Tasks | |
Published | 2019-06-19 |
URL | https://arxiv.org/abs/1906.08152v2 |
https://arxiv.org/pdf/1906.08152v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-disentangled-representations-of-3 |
Repo | |
Framework | |
Testing Markov Chains without Hitting
Title | Testing Markov Chains without Hitting |
Authors | Yeshwanth Cherapanamjeri, Peter L. Bartlett |
Abstract | We study the problem of identity testing of markov chains. In this setting, we are given access to a single trajectory from a markov chain with unknown transition matrix $Q$ and the goal is to determine whether $Q = P$ for some known matrix $P$ or $\text{Dist}(P, Q) \geq \epsilon$ where $\text{Dist}$ is suitably defined. In recent work by Daskalakis, Dikkala and Gravin, 2018, it was shown that it is possible to distinguish between the two cases provided the length of the observed trajectory is at least super-linear in the hitting time of $P$ which may be arbitrarily large. In this paper, we propose an algorithm that avoids this dependence on hitting time thus enabling efficient testing of markov chains even in cases where it is infeasible to observe every state in the chain. Our algorithm is based on combining classical ideas from approximation algorithms with techniques for the spectral analysis of markov chains. |
Tasks | |
Published | 2019-02-06 |
URL | http://arxiv.org/abs/1902.01999v1 |
http://arxiv.org/pdf/1902.01999v1.pdf | |
PWC | https://paperswithcode.com/paper/testing-markov-chains-without-hitting |
Repo | |
Framework | |
Deep Radar Waveform Design for Efficient Automotive Radar Sensing
Title | Deep Radar Waveform Design for Efficient Automotive Radar Sensing |
Authors | Shahin Khobahi, Arindam Bose, Mojtaba Soltanalian |
Abstract | In radar systems, unimodular (or constant-modulus) waveform design plays an important role in achieving better clutter/interference rejection, as well as a more accurate estimation of the target parameters. The design of such sequences has been studied widely in the last few decades, with most design algorithms requiring sophisticated a priori knowledge of environmental parameters which may be difficult to obtain in real-time scenarios. In this paper, we propose a novel hybrid model-driven and data-driven architecture that adapts to the ever changing environment and allows for adaptive unimodular waveform design. In particular, the approach lays the groundwork for developing extremely low-cost waveform design and processing frameworks for radar systems deployed in autonomous vehicles. The proposed model-based deep architecture imitates a well-known unimodular signal design algorithm in its structure, and can quickly infer statistical information from the environment using the observed data. Our numerical experiments portray the advantages of using the proposed method for efficient radar waveform design in time-varying environments. |
Tasks | Autonomous Vehicles |
Published | 2019-12-17 |
URL | https://arxiv.org/abs/1912.08180v2 |
https://arxiv.org/pdf/1912.08180v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-radar-waveform-design-for-efficient |
Repo | |
Framework | |
Assessing the Applicability of Authorship Verification Methods
Title | Assessing the Applicability of Authorship Verification Methods |
Authors | Oren Halvani, Christian Winter, Lukas Graner |
Abstract | Authorship verification (AV) is a research subject in the field of digital text forensics that concerns itself with the question, whether two documents have been written by the same person. During the past two decades, an increasing number of proposed AV approaches can be observed. However, a closer look at the respective studies reveals that the underlying characteristics of these methods are rarely addressed, which raises doubts regarding their applicability in real forensic settings. The objective of this paper is to fill this gap by proposing clear criteria and properties that aim to improve the characterization of existing and future AV approaches. Based on these properties, we conduct three experiments using 12 existing AV approaches, including the current state of the art. The examined methods were trained, optimized and evaluated on three self-compiled corpora, where each corpus focuses on a different aspect of applicability. Our results indicate that part of the methods are able to cope with very challenging verification cases such as 250 characters long informal chat conversations (72.7% accuracy) or cases in which two scientific documents were written at different times with an average difference of 15.6 years (> 75% accuracy). However, we also identified that all involved methods are prone to cross-topic verification cases. |
Tasks | |
Published | 2019-06-24 |
URL | https://arxiv.org/abs/1906.10551v1 |
https://arxiv.org/pdf/1906.10551v1.pdf | |
PWC | https://paperswithcode.com/paper/assessing-the-applicability-of-authorship |
Repo | |
Framework | |
A Dataset for measuring reading levels in India at scale
Title | A Dataset for measuring reading levels in India at scale |
Authors | Dolly Agarwal, Jayant Gupchup, Nishant Baghel |
Abstract | One out of four children in India are leaving grade eight without basic reading skills. Measuring the reading levels in a vast country like India poses significant hurdles. Recent advances in machine learning opens up the possibility of automating this task. However, the datasets of children’s speech are not only rare but are primarily in English. To solve this assessment problem and advance deep learning research in regional Indian languages, we present the ASER dataset of children in the age group of 6-14. The dataset consists of 5,301 subjects generating 81,330 labeled audio clips in Hindi, Marathi and English. These labels represent expert opinions on the child’s ability to read at a specified level. Using this dataset, we built a simple ASR-based classifier. Early results indicate that we can achieve a prediction accuracy of 86% for the English language. Considering the ASER survey spans half a million subjects, this dataset can grow to those scales. |
Tasks | |
Published | 2019-11-27 |
URL | https://arxiv.org/abs/1912.04381v2 |
https://arxiv.org/pdf/1912.04381v2.pdf | |
PWC | https://paperswithcode.com/paper/a-dataset-for-measuring-reading-levels-in |
Repo | |
Framework | |
Transform the Set: Memory Attentive Generation of Guided and Unguided Image Collages
Title | Transform the Set: Memory Attentive Generation of Guided and Unguided Image Collages |
Authors | Nikolay Jetchev, Urs Bergmann, Gökhan Yildirim |
Abstract | Cutting and pasting image segments feels intuitive: the choice of source templates gives artists flexibility in recombining existing source material. Formally, this process takes an image set as input and outputs a collage of the set elements. Such selection from sets of source templates does not fit easily in classical convolutional neural models requiring inputs of fixed size. Inspired by advances in attention and set-input machine learning, we present a novel architecture that can generate in one forward pass image collages of source templates using set-structured representations. This paper has the following contributions: (i) a novel framework for image generation called Memory Attentive Generation of Image Collages (MAGIC) which gives artists new ways to create digital collages; (ii) from the machine-learning perspective, we show a novel Generative Adversarial Networks (GAN) architecture that uses Set-Transformer layers and set-pooling to blend sets of random image samples - a hybrid non-parametric approach. |
Tasks | Image Generation |
Published | 2019-10-16 |
URL | https://arxiv.org/abs/1910.07236v2 |
https://arxiv.org/pdf/1910.07236v2.pdf | |
PWC | https://paperswithcode.com/paper/transform-the-set-memory-attentive-generation |
Repo | |
Framework | |