January 27, 2020

3389 words 16 mins read

Paper Group ANR 1121

Paper Group ANR 1121

Front2Back: Single View 3D Shape Reconstruction via Front to Back Prediction. Crowd Counting on Images with Scale Variation and Isolated Clusters. Viability of machine learning to reduce workload in systematic review screenings in the health sciences: a working paper. SizeNet: Weakly Supervised Learning of Visual Size and Fit in Fashion Images. Aug …

Front2Back: Single View 3D Shape Reconstruction via Front to Back Prediction

Title Front2Back: Single View 3D Shape Reconstruction via Front to Back Prediction
Authors Yuan Yao, Nico Schertler, Enrique Rosales, Helge Rhodin, Leonid Sigal, Alla Sheffer
Abstract Reconstruction of a 3D shape from a single 2D image is a classical computer vision problem, whose difficulty stems from the inherent ambiguity of recovering occluded or only partially observed surfaces. Recent methods address this challenge through the use of largely unstructured neural networks that effectively distill conditional mapping and priors over 3D shape. In this work, we induce structure and geometric constraints by leveraging three core observations: (1) the surface of most everyday objects is often almost entirely exposed from pairs of typical opposite views; (2) everyday objects often exhibit global reflective symmetries which can be accurately predicted from single views; (3) opposite orthographic views of a 3D shape share consistent silhouettes. Following these observations, we first predict orthographic 2.5D visible surface maps (depth, normal and silhouette) from perspective 2D images, and detect global reflective symmetries in this data; second, we predict the back facing depth and normal maps using as input the front maps and, when available, the symmetric reflections of these maps; and finally, we reconstruct a 3D mesh from the union of these maps using a surface reconstruction method best suited for this data. Our experiments demonstrate that our framework outperforms state-of-the art approaches for 3D shape reconstructions from 2D and 2.5D data in terms of input fidelity and details preservation. Specifically, we achieve 12% better performance on average in ShapeNet benchmark dataset, and up to 19% for certain classes of objects (e.g., chairs and vessels).
Tasks
Published 2019-12-23
URL https://arxiv.org/abs/1912.10589v2
PDF https://arxiv.org/pdf/1912.10589v2.pdf
PWC https://paperswithcode.com/paper/front2back-single-view-3d-shape
Repo
Framework

Crowd Counting on Images with Scale Variation and Isolated Clusters

Title Crowd Counting on Images with Scale Variation and Isolated Clusters
Authors Haoyue Bai, Song Wen, S. -H. Gary Chan
Abstract Crowd counting is to estimate the number of objects (e.g., people or vehicles) in an image of unconstrained congested scenes. Designing a general crowd counting algorithm applicable to a wide range of crowd images is challenging, mainly due to the possibly large variation in object scales and the presence of many isolated small clusters. Previous approaches based on convolution operations with multi-branch architecture are effective for only some narrow bands of scales and have not captured the long-range contextual relationship due to isolated clustering. To address that, we propose SACANet, a novel scale-adaptive long-range context-aware network for crowd counting. SACANet consists of three major modules: the pyramid contextual module which extracts long-range contextual information and enlarges the receptive field, a scale-adaptive self-attention multi-branch module to attain high scale sensitivity and detection accuracy of isolated clusters, and a hierarchical fusion module to fuse multi-level self-attention features. With group normalization, SACANet achieves better optimality in the training process. We have conducted extensive experiments using the VisDrone2019 People dataset, the VisDrone2019 Vehicle dataset, and some other challenging benchmarks. As compared with the state-of-the-art methods, SACANet is shown to be effective, especially for extremely crowded conditions with diverse scales and scattered clusters, and achieves much lower MAE as compared with baselines.
Tasks Crowd Counting
Published 2019-09-09
URL https://arxiv.org/abs/1909.03839v1
PDF https://arxiv.org/pdf/1909.03839v1.pdf
PWC https://paperswithcode.com/paper/crowd-counting-on-images-with-scale-variation
Repo
Framework

Viability of machine learning to reduce workload in systematic review screenings in the health sciences: a working paper

Title Viability of machine learning to reduce workload in systematic review screenings in the health sciences: a working paper
Authors Muhammad Maaz
Abstract Systematic reviews, which summarize and synthesize all the current research in a specific topic, are a crucial component to academia. They are especially important in the biomedical and health sciences, where they synthesize the state of medical evidence and conclude the best course of action for various diseases, pathologies, and treatments. Due to the immense amount of literature that exists, as well as the output rate of research, reviewing abstracts can be a laborious process. Automation may be able to significantly reduce this workload. Of course, such classifications are not easily automated due to the peculiar nature of written language. Machine learning may be able to help. This paper explored the viability and effectiveness of using machine learning modelling to classify abstracts according to specific exclusion/inclusion criteria, as would be done in the first stage of a systematic review. The specific task was performing the classification of deciding whether an abstract is a randomized control trial (RCT) or not, a very common classification made in systematic reviews in the healthcare field. Random training/testing splits of an n=2042 dataset of labelled abstracts were repeatedly created (1000 times in total), with a model trained and tested on each of these instances. A Bayes classifier as well as an SVM classifier were used, and compared to non-machine learning, simplistic approaches to textual classification. An SVM classifier was seen to be highly effective, yielding a 90% accuracy, as well as an F1 score of 0.84, and yielded a potential workload reduction of 70%. This shows that machine learning has the potential to significantly revolutionize the abstract screening process in healthcare systematic reviews.
Tasks
Published 2019-08-22
URL https://arxiv.org/abs/1908.08610v1
PDF https://arxiv.org/pdf/1908.08610v1.pdf
PWC https://paperswithcode.com/paper/viability-of-machine-learning-to-reduce
Repo
Framework

SizeNet: Weakly Supervised Learning of Visual Size and Fit in Fashion Images

Title SizeNet: Weakly Supervised Learning of Visual Size and Fit in Fashion Images
Authors Nour Karessli, Romain Guigourès, Reza Shirvany
Abstract Finding clothes that fit is a hot topic in the e-commerce fashion industry. Most approaches addressing this problem are based on statistical methods relying on historical data of articles purchased and returned to the store. Such approaches suffer from the cold start problem for the thousands of articles appearing on the shopping platforms every day, for which no prior purchase history is available. We propose to employ visual data to infer size and fit characteristics of fashion articles. We introduce SizeNet, a weakly-supervised teacher-student training framework that leverages the power of statistical models combined with the rich visual information from article images to learn visual cues for size and fit characteristics, capable of tackling the challenging cold start problem. Detailed experiments are performed on thousands of textile garments, including dresses, trousers, knitwear, tops, etc. from hundreds of different brands.
Tasks
Published 2019-05-28
URL https://arxiv.org/abs/1905.11784v1
PDF https://arxiv.org/pdf/1905.11784v1.pdf
PWC https://paperswithcode.com/paper/sizenet-weakly-supervised-learning-of-visual
Repo
Framework

Augmentation Methods on Monophonic Audio for Instrument Classification in Polyphonic Music

Title Augmentation Methods on Monophonic Audio for Instrument Classification in Polyphonic Music
Authors Agelos Kratimenos, Kleanthis Avramidis, Christos Garoufis, Athanasia Zlatintsi, Petros Maragos
Abstract Instrument classification is one of the fields in Music Information Retrieval (MIR) that has attracted a lot of research interest. However, the majority of that is dealing with monophonic music, while efforts on polyphonic material mainly focus on predominant instrument recognition. In this paper, we propose an approach for instrument classification in polyphonic music from purely monophonic data, that involves performing data augmentation by mixing different audio segments. A variety of data augmentation techniques focusing on different sonic aspects, such as overlaying audio segments of the same genre, as well as pitch and tempo-based synchronization, are explored. We utilize Convolutional Neural Networks for the classification task, comparing shallow to deep network architectures. We further investigate the usage of a combination of the above classifiers, each trained on a single augmented dataset. An ensemble of VGG-like classifiers, trained on non-augmented, pitch-synchronized, tempo-synchronized and genre-similar excerpts, respectively, yields the best results, achieving slightly above 80% in terms of label ranking average precision (LRAP) in the IRMAS test set.ruments in over 2300 testing tracks.
Tasks Data Augmentation, Information Retrieval, Music Information Retrieval
Published 2019-11-28
URL https://arxiv.org/abs/1911.12505v2
PDF https://arxiv.org/pdf/1911.12505v2.pdf
PWC https://paperswithcode.com/paper/augmentation-methods-on-monophonic-audio-for
Repo
Framework

Budget-aware Semi-Supervised Semantic and Instance Segmentation

Title Budget-aware Semi-Supervised Semantic and Instance Segmentation
Authors Miriam Bellver, Amaia Salvador, Jordi Torres, Xavier Giro-i-Nieto
Abstract Methods that move towards less supervised scenarios are key for image segmentation, as dense labels demand significant human intervention. Generally, the annotation burden is mitigated by labeling datasets with weaker forms of supervision, e.g. image-level labels or bounding boxes. Another option are semi-supervised settings, that commonly leverage a few strong annotations and a huge number of unlabeled/weakly-labeled data. In this paper, we revisit semi-supervised segmentation schemes and narrow down significantly the annotation budget (in terms of total labeling time of the training set) compared to previous approaches. With a very simple pipeline, we demonstrate that at low annotation budgets, semi-supervised methods outperform by a wide margin weakly-supervised ones for both semantic and instance segmentation. Our approach also outperforms previous semi-supervised works at a much reduced labeling cost. We present results for the Pascal VOC benchmark and unify weakly and semi-supervised approaches by considering the total annotation budget, thus allowing a fairer comparison between methods.
Tasks Instance Segmentation, Semantic Segmentation
Published 2019-05-14
URL https://arxiv.org/abs/1905.05880v2
PDF https://arxiv.org/pdf/1905.05880v2.pdf
PWC https://paperswithcode.com/paper/budget-aware-semi-supervised-semantic-and
Repo
Framework

Learning a Representation for Cover Song Identification Using Convolutional Neural Network

Title Learning a Representation for Cover Song Identification Using Convolutional Neural Network
Authors Zhesong Yu, Xiaoshuo Xu, Xiaoou Chen, Deshun Yang
Abstract Cover song identification represents a challenging task in the field of Music Information Retrieval (MIR) due to complex musical variations between query tracks and cover versions. Previous works typically utilize hand-crafted features and alignment algorithms for the task. More recently, further breakthroughs are achieved employing neural network approaches. In this paper, we propose a novel Convolutional Neural Network (CNN) architecture based on the characteristics of the cover song task. We first train the network through classification strategies; the network is then used to extract music representation for cover song identification. A scheme is designed to train robust models against tempo changes. Experimental results show that our approach outperforms state-of-the-art methods on all public datasets, improving the performance especially on the large dataset.
Tasks Information Retrieval, Music Information Retrieval
Published 2019-11-01
URL https://arxiv.org/abs/1911.00334v1
PDF https://arxiv.org/pdf/1911.00334v1.pdf
PWC https://paperswithcode.com/paper/learning-a-representation-for-cover-song
Repo
Framework

A Convolutional Approach to Melody Line Identification in Symbolic Scores

Title A Convolutional Approach to Melody Line Identification in Symbolic Scores
Authors Federico Simonetta, Carlos Cancino-Chacón, Stavros Ntalampiras, Gerhard Widmer
Abstract In many musical traditions, the melody line is of primary significance in a piece. Human listeners can readily distinguish melodies from accompaniment; however, making this distinction given only the written score – i.e. without listening to the music performed – can be a difficult task. Solving this task is of great importance for both Music Information Retrieval and musicological applications. In this paper, we propose an automated approach to identifying the most salient melody line in a symbolic score. The backbone of the method consists of a convolutional neural network (CNN) estimating the probability that each note in the score (more precisely: each pixel in a piano roll encoding of the score) belongs to the melody line. We train and evaluate the method on various datasets, using manual annotations where available and solo instrument parts where not. We also propose a method to inspect the CNN and to analyze the influence exerted by notes on the prediction of other notes; this method can be applied whenever the output of a neural network has the same size as the input.
Tasks Information Retrieval, Music Information Retrieval
Published 2019-06-24
URL https://arxiv.org/abs/1906.10547v1
PDF https://arxiv.org/pdf/1906.10547v1.pdf
PWC https://paperswithcode.com/paper/a-convolutional-approach-to-melody-line
Repo
Framework
Title MANAS: Multi-Agent Neural Architecture Search
Authors Fabio Maria Carlucci, Pedro M Esperança, Marco Singh, Victor Gabillon, Antoine Yang, Hang Xu, Zewei Chen, Jun Wang
Abstract The Neural Architecture Search (NAS) problem is typically formulated as a graph search problem where the goal is to learn the optimal operations over edges in order to maximise a graph-level global objective. Due to the large architecture parameter space, efficiency is a key bottleneck preventing NAS from its practical use. In this paper, we address the issue by framing NAS as a multi-agent problem where agents control a subset of the network and coordinate to reach optimal architectures. We provide two distinct lightweight implementations, with reduced memory requirements (1/8th of state-of-the-art), and performances above those of much more computationally expensive methods. Theoretically, we demonstrate vanishing regrets of the form O(sqrt(T)), with T being the total number of rounds. Finally, aware that random search is an, often ignored, effective baseline we perform additional experiments on 3 alternative datasets and 2 network configurations, and achieve favourable results in comparison.
Tasks Neural Architecture Search
Published 2019-09-03
URL https://arxiv.org/abs/1909.01051v3
PDF https://arxiv.org/pdf/1909.01051v3.pdf
PWC https://paperswithcode.com/paper/manas-multi-agent-neural-architecture-search
Repo
Framework

Predicting Responses to a Robot’s Future Motion using Generative Recurrent Neural Networks

Title Predicting Responses to a Robot’s Future Motion using Generative Recurrent Neural Networks
Authors Stuart Eiffert, Salah Sukkarieh
Abstract Robotic navigation through crowds or herds requires the ability to both predict the future motion of nearby individuals and understand how these predictions might change in response to a robot’s future action. State of the art trajectory prediction models using Recurrent Neural Networks (RNNs) do not currently account for a planned future action of a robot, and so cannot predict how an individual will move in response to a robot’s planned path. We propose an approach that adapts RNNs to use a robot’s next planned action as an input alongside the current position of nearby individuals. This allows the model to learn the response of individuals with regards to a robot’s motion from real world observations. By linking a robot’s actions to the response of those around it in training, we show that we are able to not only improve prediction accuracy in close range interactions, but also to predict the likely response of surrounding individuals to simulated actions. This allows the use of the model to simulate state transitions, without requiring any assumptions on agent interaction. We apply this model to varied datasets, including crowds of pedestrians interacting with vehicles and bicycles, and livestock interacting with a robotic vehicle.
Tasks Trajectory Prediction
Published 2019-09-30
URL https://arxiv.org/abs/1909.13486v2
PDF https://arxiv.org/pdf/1909.13486v2.pdf
PWC https://paperswithcode.com/paper/predicting-responses-to-a-robots-future
Repo
Framework

Content Adaptive Optimization for Neural Image Compression

Title Content Adaptive Optimization for Neural Image Compression
Authors Joaquim Campos, Simon Meierhans, Abdelaziz Djelouah, Christopher Schroers
Abstract The field of neural image compression has witnessed exciting progress as recently proposed architectures already surpass the established transform coding based approaches. While, so far, research has mainly focused on architecture and model improvements, in this work we explore content adaptive optimization. To this end, we introduce an iterative procedure which adapts the latent representation to the specific content we wish to compress while keeping the parameters of the network and the predictive model fixed. Our experiments show that this allows for an overall increase in rate-distortion performance, independently of the specific architecture used. Furthermore, we also evaluate this strategy in the context of adapting a pretrained network to other content that is different in visual appearance or resolution. Here, our experiments show that our adaptation strategy can largely close the gap as compared to models specifically trained for the given content while having the benefit that no additional data in the form of model parameter updates has to be transmitted.
Tasks Image Compression
Published 2019-06-04
URL https://arxiv.org/abs/1906.01223v2
PDF https://arxiv.org/pdf/1906.01223v2.pdf
PWC https://paperswithcode.com/paper/content-adaptive-optimization-for-neural
Repo
Framework

Explicitly Conditioned Melody Generation: A Case Study with Interdependent RNNs

Title Explicitly Conditioned Melody Generation: A Case Study with Interdependent RNNs
Authors Benjamin Genchel, Ashis Pati, Alexander Lerch
Abstract Deep generative models for symbolic music are typically designed to model temporal dependencies in music so as to predict the next musical event given previous events. In many cases, such models are expected to learn abstract concepts such as harmony, meter, and rhythm from raw musical data without any additional information. In this study, we investigate the effects of explicitly conditioning deep generative models with musically relevant information. Specifically, we study the effects of four different conditioning inputs on the performance of a recurrent monophonic melody generation model. Several combinations of these conditioning inputs are used to train different model variants which are then evaluated using three objective evaluation paradigms across two genres of music. The results indicate musically relevant conditioning significantly improves learning and performance, and reveal how this information affects learning of musical features related to pitch and rhythm. An informal subjective evaluation suggests a corresponding improvement in the aesthetic quality of generations.
Tasks
Published 2019-07-10
URL https://arxiv.org/abs/1907.05208v1
PDF https://arxiv.org/pdf/1907.05208v1.pdf
PWC https://paperswithcode.com/paper/explicitly-conditioned-melody-generation-a
Repo
Framework

Structured Query Construction via Knowledge Graph Embedding

Title Structured Query Construction via Knowledge Graph Embedding
Authors Ruijie Wang, Meng Wang, Jun Liu, Michael Cochez, Stefan Decker
Abstract In order to facilitate the accesses of general users to knowledge graphs, an increasing effort is being exerted to construct graph-structured queries of given natural language questions. At the core of the construction is to deduce the structure of the target query and determine the vertices/edges which constitute the query. Existing query construction methods rely on question understanding and conventional graph-based algorithms which lead to inefficient and degraded performances facing complex natural language questions over knowledge graphs with large scales. In this paper, we focus on this problem and propose a novel framework standing on recent knowledge graph embedding techniques. Our framework first encodes the underlying knowledge graph into a low-dimensional embedding space by leveraging generalized local knowledge graphs. Given a natural language question, the learned embedding representations of the knowledge graph are utilized to compute the query structure and assemble vertices/edges into the target query. Extensive experiments were conducted on the benchmark dataset, and the results demonstrate that our framework outperforms state-of-the-art baseline models regarding effectiveness and efficiency.
Tasks Graph Embedding, Knowledge Graph Embedding, Knowledge Graphs
Published 2019-09-06
URL https://arxiv.org/abs/1909.02930v1
PDF https://arxiv.org/pdf/1909.02930v1.pdf
PWC https://paperswithcode.com/paper/structured-query-construction-via-knowledge
Repo
Framework

Exploring Scholarly Data by Semantic Query on Knowledge Graph Embedding Space

Title Exploring Scholarly Data by Semantic Query on Knowledge Graph Embedding Space
Authors Hung Nghiep Tran, Atsuhiro Takasu
Abstract The trends of open science have enabled several open scholarly datasets which include millions of papers and authors. Managing, exploring, and utilizing such large and complicated datasets effectively are challenging. In recent years, the knowledge graph has emerged as a universal data format for representing knowledge about heterogeneous entities and their relationships. The knowledge graph can be modeled by knowledge graph embedding methods, which represent entities and relations as embedding vectors in semantic space, then model the interactions between these embedding vectors. However, the semantic structures in the knowledge graph embedding space are not well-studied, thus knowledge graph embedding methods are usually only used for knowledge graph completion but not data representation and analysis. In this paper, we propose to analyze these semantic structures based on the well-studied word embedding space and use them to support data exploration. We also define the semantic queries, which are algebraic operations between the embedding vectors in the knowledge graph embedding space, to solve queries such as similarity and analogy between the entities on the original datasets. We then design a general framework for data exploration by semantic queries and discuss the solution to some traditional scholarly data exploration tasks. We also propose some new interesting tasks that can be solved based on the uncanny semantic structures of the embedding space.
Tasks Graph Embedding, Knowledge Graph Completion, Knowledge Graph Embedding
Published 2019-09-17
URL https://arxiv.org/abs/1909.08191v1
PDF https://arxiv.org/pdf/1909.08191v1.pdf
PWC https://paperswithcode.com/paper/exploring-scholarly-data-by-semantic-query-on
Repo
Framework

Vision-Based Autonomous Vehicle Control using the Two-Point Visual Driver Control Model

Title Vision-Based Autonomous Vehicle Control using the Two-Point Visual Driver Control Model
Authors Justin Zheng, Kazuhide Okamoto, Panagiotis Tsiotras
Abstract This work proposes a new self-driving framework that uses a human driver control model, whose feature-input values are extracted from images using deep convolutional neural networks (CNNs). The development of image processing techniques using CNNs along with accelerated computing hardware has recently enabled real-time detection of these feature-input values. The use of human driver models can lead to more “natural” driving behavior of self-driving vehicles. Specifically, we use the well-known two-point visual driver control model as the controller, and we use a top-down lane cost map CNN and the YOLOv2 CNN to extract feature-input values. This framework relies exclusively on inputs from low-cost sensors like a monocular camera and wheel speed sensors. We experimentally validate the proposed framework on an outdoor track using a 1/5th-scale autonomous vehicle platform.
Tasks
Published 2019-09-29
URL https://arxiv.org/abs/1910.04862v1
PDF https://arxiv.org/pdf/1910.04862v1.pdf
PWC https://paperswithcode.com/paper/vision-based-autonomous-vehicle-control-using
Repo
Framework
comments powered by Disqus