January 27, 2020

3389 words 16 mins read

Paper Group ANR 1121

Front2Back: Single View 3D Shape Reconstruction via Front to Back Prediction. Crowd Counting on Images with Scale Variation and Isolated Clusters. Viability of machine learning to reduce workload in systematic review screenings in the health sciences: a working paper. SizeNet: Weakly Supervised Learning of Visual Size and Fit in Fashion Images. Aug …

Front2Back: Single View 3D Shape Reconstruction via Front to Back Prediction


Title	Front2Back: Single View 3D Shape Reconstruction via Front to Back Prediction
Authors	Yuan Yao, Nico Schertler, Enrique Rosales, Helge Rhodin, Leonid Sigal, Alla Sheffer
Abstract	Reconstruction of a 3D shape from a single 2D image is a classical computer vision problem, whose difficulty stems from the inherent ambiguity of recovering occluded or only partially observed surfaces. Recent methods address this challenge through the use of largely unstructured neural networks that effectively distill conditional mapping and priors over 3D shape. In this work, we induce structure and geometric constraints by leveraging three core observations: (1) the surface of most everyday objects is often almost entirely exposed from pairs of typical opposite views; (2) everyday objects often exhibit global reflective symmetries which can be accurately predicted from single views; (3) opposite orthographic views of a 3D shape share consistent silhouettes. Following these observations, we first predict orthographic 2.5D visible surface maps (depth, normal and silhouette) from perspective 2D images, and detect global reflective symmetries in this data; second, we predict the back facing depth and normal maps using as input the front maps and, when available, the symmetric reflections of these maps; and finally, we reconstruct a 3D mesh from the union of these maps using a surface reconstruction method best suited for this data. Our experiments demonstrate that our framework outperforms state-of-the art approaches for 3D shape reconstructions from 2D and 2.5D data in terms of input fidelity and details preservation. Specifically, we achieve 12% better performance on average in ShapeNet benchmark dataset, and up to 19% for certain classes of objects (e.g., chairs and vessels).
Tasks
Published	2019-12-23
URL	https://arxiv.org/abs/1912.10589v2
PDF	https://arxiv.org/pdf/1912.10589v2.pdf
PWC	https://paperswithcode.com/paper/front2back-single-view-3d-shape
Repo
Framework

Crowd Counting on Images with Scale Variation and Isolated Clusters


Title	Crowd Counting on Images with Scale Variation and Isolated Clusters
Authors	Haoyue Bai, Song Wen, S. -H. Gary Chan
Abstract	Crowd counting is to estimate the number of objects (e.g., people or vehicles) in an image of unconstrained congested scenes. Designing a general crowd counting algorithm applicable to a wide range of crowd images is challenging, mainly due to the possibly large variation in object scales and the presence of many isolated small clusters. Previous approaches based on convolution operations with multi-branch architecture are effective for only some narrow bands of scales and have not captured the long-range contextual relationship due to isolated clustering. To address that, we propose SACANet, a novel scale-adaptive long-range context-aware network for crowd counting. SACANet consists of three major modules: the pyramid contextual module which extracts long-range contextual information and enlarges the receptive field, a scale-adaptive self-attention multi-branch module to attain high scale sensitivity and detection accuracy of isolated clusters, and a hierarchical fusion module to fuse multi-level self-attention features. With group normalization, SACANet achieves better optimality in the training process. We have conducted extensive experiments using the VisDrone2019 People dataset, the VisDrone2019 Vehicle dataset, and some other challenging benchmarks. As compared with the state-of-the-art methods, SACANet is shown to be effective, especially for extremely crowded conditions with diverse scales and scattered clusters, and achieves much lower MAE as compared with baselines.
Tasks	Crowd Counting
Published	2019-09-09
URL	https://arxiv.org/abs/1909.03839v1
PDF	https://arxiv.org/pdf/1909.03839v1.pdf
PWC	https://paperswithcode.com/paper/crowd-counting-on-images-with-scale-variation
Repo
Framework

Viability of machine learning to reduce workload in systematic review screenings in the health sciences: a working paper


Title	Viability of machine learning to reduce workload in systematic review screenings in the health sciences: a working paper
Authors	Muhammad Maaz
Abstract	Systematic reviews, which summarize and synthesize all the current research in a specific topic, are a crucial component to academia. They are especially important in the biomedical and health sciences, where they synthesize the state of medical evidence and conclude the best course of action for various diseases, pathologies, and treatments. Due to the immense amount of literature that exists, as well as the output rate of research, reviewing abstracts can be a laborious process. Automation may be able to significantly reduce this workload. Of course, such classifications are not easily automated due to the peculiar nature of written language. Machine learning may be able to help. This paper explored the viability and effectiveness of using machine learning modelling to classify abstracts according to specific exclusion/inclusion criteria, as would be done in the first stage of a systematic review. The specific task was performing the classification of deciding whether an abstract is a randomized control trial (RCT) or not, a very common classification made in systematic reviews in the healthcare field. Random training/testing splits of an n=2042 dataset of labelled abstracts were repeatedly created (1000 times in total), with a model trained and tested on each of these instances. A Bayes classifier as well as an SVM classifier were used, and compared to non-machine learning, simplistic approaches to textual classification. An SVM classifier was seen to be highly effective, yielding a 90% accuracy, as well as an F1 score of 0.84, and yielded a potential workload reduction of 70%. This shows that machine learning has the potential to significantly revolutionize the abstract screening process in healthcare systematic reviews.
Tasks
Published	2019-08-22
URL	https://arxiv.org/abs/1908.08610v1
PDF	https://arxiv.org/pdf/1908.08610v1.pdf
PWC	https://paperswithcode.com/paper/viability-of-machine-learning-to-reduce
Repo
Framework

SizeNet: Weakly Supervised Learning of Visual Size and Fit in Fashion Images


Title	SizeNet: Weakly Supervised Learning of Visual Size and Fit in Fashion Images
Authors	Nour Karessli, Romain Guigourès, Reza Shirvany
Abstract	Finding clothes that fit is a hot topic in the e-commerce fashion industry. Most approaches addressing this problem are based on statistical methods relying on historical data of articles purchased and returned to the store. Such approaches suffer from the cold start problem for the thousands of articles appearing on the shopping platforms every day, for which no prior purchase history is available. We propose to employ visual data to infer size and fit characteristics of fashion articles. We introduce SizeNet, a weakly-supervised teacher-student training framework that leverages the power of statistical models combined with the rich visual information from article images to learn visual cues for size and fit characteristics, capable of tackling the challenging cold start problem. Detailed experiments are performed on thousands of textile garments, including dresses, trousers, knitwear, tops, etc. from hundreds of different brands.
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11784v1
PDF	https://arxiv.org/pdf/1905.11784v1.pdf
PWC	https://paperswithcode.com/paper/sizenet-weakly-supervised-learning-of-visual
Repo
Framework

Augmentation Methods on Monophonic Audio for Instrument Classification in Polyphonic Music


Title	Augmentation Methods on Monophonic Audio for Instrument Classification in Polyphonic Music
Authors	Agelos Kratimenos, Kleanthis Avramidis, Christos Garoufis, Athanasia Zlatintsi, Petros Maragos
Abstract	Instrument classification is one of the fields in Music Information Retrieval (MIR) that has attracted a lot of research interest. However, the majority of that is dealing with monophonic music, while efforts on polyphonic material mainly focus on predominant instrument recognition. In this paper, we propose an approach for instrument classification in polyphonic music from purely monophonic data, that involves performing data augmentation by mixing different audio segments. A variety of data augmentation techniques focusing on different sonic aspects, such as overlaying audio segments of the same genre, as well as pitch and tempo-based synchronization, are explored. We utilize Convolutional Neural Networks for the classification task, comparing shallow to deep network architectures. We further investigate the usage of a combination of the above classifiers, each trained on a single augmented dataset. An ensemble of VGG-like classifiers, trained on non-augmented, pitch-synchronized, tempo-synchronized and genre-similar excerpts, respectively, yields the best results, achieving slightly above 80% in terms of label ranking average precision (LRAP) in the IRMAS test set.ruments in over 2300 testing tracks.
Tasks	Data Augmentation, Information Retrieval, Music Information Retrieval
Published	2019-11-28
URL	https://arxiv.org/abs/1911.12505v2
PDF	https://arxiv.org/pdf/1911.12505v2.pdf
PWC	https://paperswithcode.com/paper/augmentation-methods-on-monophonic-audio-for
Repo
Framework

Budget-aware Semi-Supervised Semantic and Instance Segmentation


Title	Budget-aware Semi-Supervised Semantic and Instance Segmentation
Authors	Miriam Bellver, Amaia Salvador, Jordi Torres, Xavier Giro-i-Nieto
Abstract	Methods that move towards less supervised scenarios are key for image segmentation, as dense labels demand significant human intervention. Generally, the annotation burden is mitigated by labeling datasets with weaker forms of supervision, e.g. image-level labels or bounding boxes. Another option are semi-supervised settings, that commonly leverage a few strong annotations and a huge number of unlabeled/weakly-labeled data. In this paper, we revisit semi-supervised segmentation schemes and narrow down significantly the annotation budget (in terms of total labeling time of the training set) compared to previous approaches. With a very simple pipeline, we demonstrate that at low annotation budgets, semi-supervised methods outperform by a wide margin weakly-supervised ones for both semantic and instance segmentation. Our approach also outperforms previous semi-supervised works at a much reduced labeling cost. We present results for the Pascal VOC benchmark and unify weakly and semi-supervised approaches by considering the total annotation budget, thus allowing a fairer comparison between methods.
Tasks	Instance Segmentation, Semantic Segmentation
Published	2019-05-14
URL	https://arxiv.org/abs/1905.05880v2
PDF	https://arxiv.org/pdf/1905.05880v2.pdf
PWC	https://paperswithcode.com/paper/budget-aware-semi-supervised-semantic-and
Repo
Framework

Learning a Representation for Cover Song Identification Using Convolutional Neural Network


Title	Learning a Representation for Cover Song Identification Using Convolutional Neural Network
Authors	Zhesong Yu, Xiaoshuo Xu, Xiaoou Chen, Deshun Yang
Abstract	Cover song identification represents a challenging task in the field of Music Information Retrieval (MIR) due to complex musical variations between query tracks and cover versions. Previous works typically utilize hand-crafted features and alignment algorithms for the task. More recently, further breakthroughs are achieved employing neural network approaches. In this paper, we propose a novel Convolutional Neural Network (CNN) architecture based on the characteristics of the cover song task. We first train the network through classification strategies; the network is then used to extract music representation for cover song identification. A scheme is designed to train robust models against tempo changes. Experimental results show that our approach outperforms state-of-the-art methods on all public datasets, improving the performance especially on the large dataset.
Tasks	Information Retrieval, Music Information Retrieval
Published	2019-11-01
URL	https://arxiv.org/abs/1911.00334v1
PDF	https://arxiv.org/pdf/1911.00334v1.pdf
PWC	https://paperswithcode.com/paper/learning-a-representation-for-cover-song
Repo
Framework

A Convolutional Approach to Melody Line Identification in Symbolic Scores


Title	A Convolutional Approach to Melody Line Identification in Symbolic Scores
Authors	Federico Simonetta, Carlos Cancino-Chacón, Stavros Ntalampiras, Gerhard Widmer
Abstract	In many musical traditions, the melody line is of primary significance in a piece. Human listeners can readily distinguish melodies from accompaniment; however, making this distinction given only the written score – i.e. without listening to the music performed – can be a difficult task. Solving this task is of great importance for both Music Information Retrieval and musicological applications. In this paper, we propose an automated approach to identifying the most salient melody line in a symbolic score. The backbone of the method consists of a convolutional neural network (CNN) estimating the probability that each note in the score (more precisely: each pixel in a piano roll encoding of the score) belongs to the melody line. We train and evaluate the method on various datasets, using manual annotations where available and solo instrument parts where not. We also propose a method to inspect the CNN and to analyze the influence exerted by notes on the prediction of other notes; this method can be applied whenever the output of a neural network has the same size as the input.
Tasks	Information Retrieval, Music Information Retrieval
Published	2019-06-24
URL	https://arxiv.org/abs/1906.10547v1
PDF	https://arxiv.org/pdf/1906.10547v1.pdf
PWC	https://paperswithcode.com/paper/a-convolutional-approach-to-melody-line
Repo
Framework

MANAS: Multi-Agent Neural Architecture Search


Title	MANAS: Multi-Agent Neural Architecture Search
Authors	Fabio Maria Carlucci, Pedro M Esperança, Marco Singh, Victor Gabillon, Antoine Yang, Hang Xu, Zewei Chen, Jun Wang
Abstract	The Neural Architecture Search (NAS) problem is typically formulated as a graph search problem where the goal is to learn the optimal operations over edges in order to maximise a graph-level global objective. Due to the large architecture parameter space, efficiency is a key bottleneck preventing NAS from its practical use. In this paper, we address the issue by framing NAS as a multi-agent problem where agents control a subset of the network and coordinate to reach optimal architectures. We provide two distinct lightweight implementations, with reduced memory requirements (1/8th of state-of-the-art), and performances above those of much more computationally expensive methods. Theoretically, we demonstrate vanishing regrets of the form O(sqrt(T)), with T being the total number of rounds. Finally, aware that random search is an, often ignored, effective baseline we perform additional experiments on 3 alternative datasets and 2 network configurations, and achieve favourable results in comparison.
Tasks	Neural Architecture Search
Published	2019-09-03
URL	https://arxiv.org/abs/1909.01051v3
PDF	https://arxiv.org/pdf/1909.01051v3.pdf
PWC	https://paperswithcode.com/paper/manas-multi-agent-neural-architecture-search
Repo
Framework

Predicting Responses to a Robot’s Future Motion using Generative Recurrent Neural Networks


Title	Predicting Responses to a Robot’s Future Motion using Generative Recurrent Neural Networks
Authors	Stuart Eiffert, Salah Sukkarieh
Abstract	Robotic navigation through crowds or herds requires the ability to both predict the future motion of nearby individuals and understand how these predictions might change in response to a robot’s future action. State of the art trajectory prediction models using Recurrent Neural Networks (RNNs) do not currently account for a planned future action of a robot, and so cannot predict how an individual will move in response to a robot’s planned path. We propose an approach that adapts RNNs to use a robot’s next planned action as an input alongside the current position of nearby individuals. This allows the model to learn the response of individuals with regards to a robot’s motion from real world observations. By linking a robot’s actions to the response of those around it in training, we show that we are able to not only improve prediction accuracy in close range interactions, but also to predict the likely response of surrounding individuals to simulated actions. This allows the use of the model to simulate state transitions, without requiring any assumptions on agent interaction. We apply this model to varied datasets, including crowds of pedestrians interacting with vehicles and bicycles, and livestock interacting with a robotic vehicle.
Tasks	Trajectory Prediction
Published	2019-09-30
URL	https://arxiv.org/abs/1909.13486v2
PDF	https://arxiv.org/pdf/1909.13486v2.pdf
PWC	https://paperswithcode.com/paper/predicting-responses-to-a-robots-future
Repo
Framework

Content Adaptive Optimization for Neural Image Compression


Title	Content Adaptive Optimization for Neural Image Compression
Authors	Joaquim Campos, Simon Meierhans, Abdelaziz Djelouah, Christopher Schroers
Abstract	The field of neural image compression has witnessed exciting progress as recently proposed architectures already surpass the established transform coding based approaches. While, so far, research has mainly focused on architecture and model improvements, in this work we explore content adaptive optimization. To this end, we introduce an iterative procedure which adapts the latent representation to the specific content we wish to compress while keeping the parameters of the network and the predictive model fixed. Our experiments show that this allows for an overall increase in rate-distortion performance, independently of the specific architecture used. Furthermore, we also evaluate this strategy in the context of adapting a pretrained network to other content that is different in visual appearance or resolution. Here, our experiments show that our adaptation strategy can largely close the gap as compared to models specifically trained for the given content while having the benefit that no additional data in the form of model parameter updates has to be transmitted.
Tasks	Image Compression
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01223v2
PDF	https://arxiv.org/pdf/1906.01223v2.pdf
PWC	https://paperswithcode.com/paper/content-adaptive-optimization-for-neural
Repo
Framework

Explicitly Conditioned Melody Generation: A Case Study with Interdependent RNNs


Title	Explicitly Conditioned Melody Generation: A Case Study with Interdependent RNNs
Authors	Benjamin Genchel, Ashis Pati, Alexander Lerch
Abstract	Deep generative models for symbolic music are typically designed to model temporal dependencies in music so as to predict the next musical event given previous events. In many cases, such models are expected to learn abstract concepts such as harmony, meter, and rhythm from raw musical data without any additional information. In this study, we investigate the effects of explicitly conditioning deep generative models with musically relevant information. Specifically, we study the effects of four different conditioning inputs on the performance of a recurrent monophonic melody generation model. Several combinations of these conditioning inputs are used to train different model variants which are then evaluated using three objective evaluation paradigms across two genres of music. The results indicate musically relevant conditioning significantly improves learning and performance, and reveal how this information affects learning of musical features related to pitch and rhythm. An informal subjective evaluation suggests a corresponding improvement in the aesthetic quality of generations.
Tasks
Published	2019-07-10
URL	https://arxiv.org/abs/1907.05208v1
PDF	https://arxiv.org/pdf/1907.05208v1.pdf
PWC	https://paperswithcode.com/paper/explicitly-conditioned-melody-generation-a
Repo
Framework

Structured Query Construction via Knowledge Graph Embedding


Title	Structured Query Construction via Knowledge Graph Embedding
Authors	Ruijie Wang, Meng Wang, Jun Liu, Michael Cochez, Stefan Decker
Abstract	In order to facilitate the accesses of general users to knowledge graphs, an increasing effort is being exerted to construct graph-structured queries of given natural language questions. At the core of the construction is to deduce the structure of the target query and determine the vertices/edges which constitute the query. Existing query construction methods rely on question understanding and conventional graph-based algorithms which lead to inefficient and degraded performances facing complex natural language questions over knowledge graphs with large scales. In this paper, we focus on this problem and propose a novel framework standing on recent knowledge graph embedding techniques. Our framework first encodes the underlying knowledge graph into a low-dimensional embedding space by leveraging generalized local knowledge graphs. Given a natural language question, the learned embedding representations of the knowledge graph are utilized to compute the query structure and assemble vertices/edges into the target query. Extensive experiments were conducted on the benchmark dataset, and the results demonstrate that our framework outperforms state-of-the-art baseline models regarding effectiveness and efficiency.
Tasks	Graph Embedding, Knowledge Graph Embedding, Knowledge Graphs
Published	2019-09-06
URL	https://arxiv.org/abs/1909.02930v1
PDF	https://arxiv.org/pdf/1909.02930v1.pdf
PWC	https://paperswithcode.com/paper/structured-query-construction-via-knowledge
Repo
Framework

Exploring Scholarly Data by Semantic Query on Knowledge Graph Embedding Space


Title	Exploring Scholarly Data by Semantic Query on Knowledge Graph Embedding Space
Authors	Hung Nghiep Tran, Atsuhiro Takasu
Abstract	The trends of open science have enabled several open scholarly datasets which include millions of papers and authors. Managing, exploring, and utilizing such large and complicated datasets effectively are challenging. In recent years, the knowledge graph has emerged as a universal data format for representing knowledge about heterogeneous entities and their relationships. The knowledge graph can be modeled by knowledge graph embedding methods, which represent entities and relations as embedding vectors in semantic space, then model the interactions between these embedding vectors. However, the semantic structures in the knowledge graph embedding space are not well-studied, thus knowledge graph embedding methods are usually only used for knowledge graph completion but not data representation and analysis. In this paper, we propose to analyze these semantic structures based on the well-studied word embedding space and use them to support data exploration. We also define the semantic queries, which are algebraic operations between the embedding vectors in the knowledge graph embedding space, to solve queries such as similarity and analogy between the entities on the original datasets. We then design a general framework for data exploration by semantic queries and discuss the solution to some traditional scholarly data exploration tasks. We also propose some new interesting tasks that can be solved based on the uncanny semantic structures of the embedding space.
Tasks	Graph Embedding, Knowledge Graph Completion, Knowledge Graph Embedding
Published	2019-09-17
URL	https://arxiv.org/abs/1909.08191v1
PDF	https://arxiv.org/pdf/1909.08191v1.pdf
PWC	https://paperswithcode.com/paper/exploring-scholarly-data-by-semantic-query-on
Repo
Framework

Vision-Based Autonomous Vehicle Control using the Two-Point Visual Driver Control Model


Title	Vision-Based Autonomous Vehicle Control using the Two-Point Visual Driver Control Model
Authors	Justin Zheng, Kazuhide Okamoto, Panagiotis Tsiotras
Abstract	This work proposes a new self-driving framework that uses a human driver control model, whose feature-input values are extracted from images using deep convolutional neural networks (CNNs). The development of image processing techniques using CNNs along with accelerated computing hardware has recently enabled real-time detection of these feature-input values. The use of human driver models can lead to more “natural” driving behavior of self-driving vehicles. Specifically, we use the well-known two-point visual driver control model as the controller, and we use a top-down lane cost map CNN and the YOLOv2 CNN to extract feature-input values. This framework relies exclusively on inputs from low-cost sensors like a monocular camera and wheel speed sensors. We experimentally validate the proposed framework on an outdoor track using a 1/5th-scale autonomous vehicle platform.
Tasks
Published	2019-09-29
URL	https://arxiv.org/abs/1910.04862v1
PDF	https://arxiv.org/pdf/1910.04862v1.pdf
PWC	https://paperswithcode.com/paper/vision-based-autonomous-vehicle-control-using
Repo
Framework