April 2, 2020

3392 words 16 mins read

Paper Group ANR 281

TensorFlow Audio Models in Essentia. GeoGraph: Learning graph-based multi-view object detection with geometric cues end-to-end. Does syntax need to grow on trees? Sources of hierarchical inductive bias in sequence-to-sequence networks. The Landscape of Matrix Factorization Revisited. Deep Active Learning for Remote Sensing Object Detection. Video-b …

TensorFlow Audio Models in Essentia


Title	TensorFlow Audio Models in Essentia
Authors	Pablo Alonso-Jiménez, Dmitry Bogdanov, Jordi Pons, Xavier Serra
Abstract	Essentia is a reference open-source C++/Python library for audio and music analysis. In this work, we present a set of algorithms that employ TensorFlow in Essentia, allow predictions with pre-trained deep learning models, and are designed to offer flexibility of use, easy extensibility, and real-time inference. To show the potential of this new interface with TensorFlow, we provide a number of pre-trained state-of-the-art music tagging and classification CNN models. We run an extensive evaluation of the developed models. In particular, we assess the generalization capabilities in a cross-collection evaluation utilizing both external tag datasets as well as manual annotations tailored to the taxonomies of our models.
Tasks
Published	2020-03-16
URL	https://arxiv.org/abs/2003.07393v1
PDF	https://arxiv.org/pdf/2003.07393v1.pdf
PWC	https://paperswithcode.com/paper/tensorflow-audio-models-in-essentia
Repo
Framework

GeoGraph: Learning graph-based multi-view object detection with geometric cues end-to-end


Title	GeoGraph: Learning graph-based multi-view object detection with geometric cues end-to-end
Authors	Ahmed Samy Nassar, Stefano D’Aronco, Sébastien Lefèvre, Jan D. Wegner
Abstract	In this paper we propose an end-to-end learnable approach that detects static urban objects from multiple views, re-identifies instances, and finally assigns a geographic position per object. Our method relies on a Graph Neural Network (GNN) to, detect all objects and output their geographic positions given images and approximate camera poses as input. Our GNN simultaneously models relative pose and image evidence, and is further able to deal with an arbitrary number of input views. Our method is robust to occlusion, with similar appearance of neighboring objects, and severe changes in viewpoints by jointly reasoning about visual image appearance and relative pose. Experimental evaluation on two challenging, large-scale datasets and comparison with state-of-the-art methods show significant and systematic improvements both in accuracy and efficiency, with 2-6% gain in detection and re-ID average precision as well as 8x reduction of training time.
Tasks	Object Detection
Published	2020-03-23
URL	https://arxiv.org/abs/2003.10151v2
PDF	https://arxiv.org/pdf/2003.10151v2.pdf
PWC	https://paperswithcode.com/paper/geograph-learning-graph-based-multi-view
Repo
Framework

Does syntax need to grow on trees? Sources of hierarchical inductive bias in sequence-to-sequence networks


Title	Does syntax need to grow on trees? Sources of hierarchical inductive bias in sequence-to-sequence networks
Authors	R. Thomas McCoy, Robert Frank, Tal Linzen
Abstract	Learners that are exposed to the same training data might generalize differently due to differing inductive biases. In neural network models, inductive biases could in theory arise from any aspect of the model architecture. We investigate which architectural factors affect the generalization behavior of neural sequence-to-sequence models trained on two syntactic tasks, English question formation and English tense reinflection. For both tasks, the training set is consistent with a generalization based on hierarchical structure and a generalization based on linear order. All architectural factors that we investigated qualitatively affected how models generalized, including factors with no clear connection to hierarchical structure. For example, LSTMs and GRUs displayed qualitatively different inductive biases. However, the only factor that consistently contributed a hierarchical bias across tasks was the use of a tree-structured model rather than a model with sequential recurrence, suggesting that human-like syntactic generalization requires architectural syntactic structure.
Tasks
Published	2020-01-10
URL	https://arxiv.org/abs/2001.03632v1
PDF	https://arxiv.org/pdf/2001.03632v1.pdf
PWC	https://paperswithcode.com/paper/does-syntax-need-to-grow-on-trees-sources-of
Repo
Framework

The Landscape of Matrix Factorization Revisited


Title	The Landscape of Matrix Factorization Revisited
Authors	Hossein Valavi, Sulin Liu, Peter J. Ramadge
Abstract	We revisit the landscape of the simple matrix factorization problem. For low-rank matrix factorization, prior work has shown that there exist infinitely many critical points all of which are either global minima or strict saddles. At a strict saddle the minimum eigenvalue of the Hessian is negative. Of interest is whether this minimum eigenvalue is uniformly bounded below zero over all strict saddles. To answer this we consider orbits of critical points under the general linear group. For each orbit we identify a representative point, called a canonical point. If a canonical point is a strict saddle, so is every point on its orbit. We derive an expression for the minimum eigenvalue of the Hessian at each canonical strict saddle and use this to show that the minimum eigenvalue of the Hessian over the set of strict saddles is not uniformly bounded below zero. We also show that a known invariance property of gradient flow ensures the solution of gradient flow only encounters critical points on an invariant manifold $\mathcal{M}C$ determined by the initial condition. We show that, in contrast to the general situation, the minimum eigenvalue of strict saddles in $\mathcal{M}{0}$ is uniformly bounded below zero. We obtain an expression for this bound in terms of the singular values of the matrix being factorized. This bound depends on the size of the nonzero singular values and on the separation between distinct nonzero singular values of the matrix.
Tasks
Published	2020-02-27
URL	https://arxiv.org/abs/2002.12795v1
PDF	https://arxiv.org/pdf/2002.12795v1.pdf
PWC	https://paperswithcode.com/paper/the-landscape-of-matrix-factorization
Repo
Framework

Deep Active Learning for Remote Sensing Object Detection


Title	Deep Active Learning for Remote Sensing Object Detection
Authors	Zhenshen Qu, Jingda Du, Yong Cao, Qiuyu Guan, Pengbo Zhao
Abstract	Recently, CNN object detectors have achieved high accuracy on remote sensing images but require huge labor and time costs on annotation. In this paper, we propose a new uncertainty-based active learning which can select images with more information for annotation and detector can still reach high performance with a fraction of the training images. Our method not only analyzes objects’ classification uncertainty to find least confident objects but also considers their regression uncertainty to declare outliers. Besides, we bring out two extra weights to overcome two difficulties in remote sensing datasets, class-imbalance and difference in images’ objects amount. We experiment our active learning algorithm on DOTA dataset with CenterNet as object detector. We achieve same-level performance as full supervision with only half images. We even override full supervision with 55% images and augmented weights on least confident images.
Tasks	Active Learning, Object Detection
Published	2020-03-17
URL	https://arxiv.org/abs/2003.08793v1
PDF	https://arxiv.org/pdf/2003.08793v1.pdf
PWC	https://paperswithcode.com/paper/deep-active-learning-for-remote-sensing
Repo
Framework

Video-based Person Re-Identification using Gated Convolutional Recurrent Neural Networks


Title	Video-based Person Re-Identification using Gated Convolutional Recurrent Neural Networks
Authors	Yang Feng, Yu Wang, Jiebo Luo
Abstract	Deep neural networks have been successfully applied to solving the video-based person re-identification problem with impressive results reported. The existing networks for person re-id are designed to extract discriminative features that preserve the identity information. Usually, whole video frames are fed into the neural networks and all the regions in a frame are equally treated. This may be a suboptimal choice because many regions, e.g., background regions in the video, are not related to the person. Furthermore, the person of interest may be occluded by another person or something else. These unrelated regions may hinder person re-identification. In this paper, we introduce a novel gating mechanism to deep neural networks. Our gating mechanism will learn which regions are helpful for person re-identification and let these regions pass the gate. The unrelated background regions or occluding regions are filtered out by the gate. In each frame, the color channels and optical flow channels provide quite different information. To better leverage such information, we generate one gate using the color channels and another gate using the optical flow channels. These two gates are combined to provide a more reliable gate with a novel fusion method. Experimental results on two major datasets demonstrate the performance improvements due to the proposed gating mechanism.
Tasks	Optical Flow Estimation, Person Re-Identification, Video-Based Person Re-Identification
Published	2020-03-21
URL	https://arxiv.org/abs/2003.09717v1
PDF	https://arxiv.org/pdf/2003.09717v1.pdf
PWC	https://paperswithcode.com/paper/video-based-person-re-identification-using-1
Repo
Framework

Scalable Psychological Momentum Forecasting in Esports


Title	Scalable Psychological Momentum Forecasting in Esports
Authors	Alfonso White, Daniela M. Romano
Abstract	The world of competitive Esports and video gaming has seen and continues to experience steady growth in popularity and complexity. Correspondingly, more research on the topic is being published, ranging from social network analyses to the benchmarking of advanced artificial intelligence systems in playing against humans. In this paper, we present ongoing work on an intelligent agent recommendation engine that suggests actions to players in order to maximise success and enjoyment, both in the space of in-game choices, as well as decisions made around play session timing in the broader context. By leveraging temporal data and appropriate models, we show that a learned representation of player psychological momentum, and of tilt, can be used, in combination with player expertise, to achieve state-of-the-art performance in pre- and post-draft win prediction. Our progress toward fulfilling the potential for deriving optimal recommendations is documented.
Tasks
Published	2020-01-30
URL	https://arxiv.org/abs/2001.11274v2
PDF	https://arxiv.org/pdf/2001.11274v2.pdf
PWC	https://paperswithcode.com/paper/scalable-psychological-momentum-forecasting
Repo
Framework

Exploitation and Exploration Analysis of Elitist Evolutionary Algorithms: A Case Study


Title	Exploitation and Exploration Analysis of Elitist Evolutionary Algorithms: A Case Study
Authors	Yu Chen, Jun He
Abstract	Known as two cornerstones of problem solving by search, exploitation and exploration are extensively discussed for implementation and application of evolutionary algorithms (EAs). However, only a few researches focus on evaluation and theoretical estimation of exploitation and exploration. Considering that exploitation and exploration are two issues regarding global search and local search, this paper proposes to evaluate them via the success probability and the one-step improvement rate computed in different domains of integration. Then, case studies are performed by analyzing performances of (1+1) random univariate search and (1+1) evolutionary programming on the sphere function and the cheating problem. By rigorous theoretical analysis, we demonstrate that both exploitation and exploration of the investigated elitist EAs degenerate exponentially with the problem dimension $n$. Meanwhile, it is also shown that maximization of exploitation and exploration can be achieved by setting an appropriate value for the standard deviation $\sigma$ of Gaussian mutation, which is positively related to the distance from the present solution to the center of the promising region.
Tasks
Published	2020-01-29
URL	https://arxiv.org/abs/2001.10932v1
PDF	https://arxiv.org/pdf/2001.10932v1.pdf
PWC	https://paperswithcode.com/paper/exploitation-and-exploration-analysis-of
Repo
Framework

A level set representation method for N-dimensional convex shape and applications


Title	A level set representation method for N-dimensional convex shape and applications
Authors	Lingfeng li, Shousheng Luo, Xue-Cheng Tai, Jiang Yang
Abstract	In this work, we present a new efficient method for convex shape representation, which is regardless of the dimension of the concerned objects, using level-set approaches. Convexity prior is very useful for object completion in computer vision. It is a very challenging task to design an efficient method for high dimensional convex objects representation. In this paper, we prove that the convexity of the considered object is equivalent to the convexity of the associated signed distance function. Then, the second order condition of convex functions is used to characterize the shape convexity equivalently. We apply this new method to two applications: object segmentation with convexity prior and convex hull problem (especially with outliers). For both applications, the involved problems can be written as a general optimization problem with three constraints. Efficient algorithm based on alternating direction method of multipliers is presented for the optimization problem. Numerical experiments are conducted to verify the effectiveness and efficiency of the proposed representation method and algorithm.
Tasks	Semantic Segmentation
Published	2020-03-21
URL	https://arxiv.org/abs/2003.09600v1
PDF	https://arxiv.org/pdf/2003.09600v1.pdf
PWC	https://paperswithcode.com/paper/a-level-set-representation-method-for-n
Repo
Framework

Handling Position Bias for Unbiased Learning to Rank in Hotels Search


Title	Handling Position Bias for Unbiased Learning to Rank in Hotels Search
Authors	Yinxiao Li
Abstract	Nowadays, search ranking and recommendation systems rely on a lot of data to train machine learning models such as Learning-to-Rank (LTR) models to rank results for a given query, and implicit user feedbacks (e.g. click data) have become the dominant source of data collection due to its abundance and low cost, especially for major Internet companies. However, a drawback of this data collection approach is the data could be highly biased, and one of the most significant biases is the position bias, where users are biased towards clicking on higher ranked results. In this work, we will investigate the marginal importance of properly handling the position bias in an online test environment in Tripadvisor Hotels search. We propose an empirically effective method of handling the position bias that fully leverages the user action data. We take advantage of the fact that when user clicks a result, he has almost certainly observed all the results above, and the propensities of the results below the clicked result will be estimated by a simple but effective position bias model. The online A/B test results show that this method leads to an improved search ranking model.
Tasks	Learning-To-Rank, Recommendation Systems
Published	2020-02-28
URL	https://arxiv.org/abs/2002.12528v1
PDF	https://arxiv.org/pdf/2002.12528v1.pdf
PWC	https://paperswithcode.com/paper/handling-position-bias-for-unbiased-learning
Repo
Framework

TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval


Title	TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval
Authors	Wenhao Lu, Jian Jiao, Ruofei Zhang
Abstract	Pre-trained language models like BERT have achieved great success in a wide variety of NLP tasks, while the superior performance comes with high demand in computational resources, which hinders the application in low-latency IR systems. We present TwinBERT model for effective and efficient retrieval, which has twin-structured BERT-like encoders to represent query and document respectively and a crossing layer to combine the embeddings and produce a similarity score. Different from BERT, where the two input sentences are concatenated and encoded together, TwinBERT decouples them during encoding and produces the embeddings for query and document independently, which allows document embeddings to be pre-computed offline and cached in memory. Thereupon, the computation left for run-time is from the query encoding and query-document crossing only. This single change can save large amount of computation time and resources, and therefore significantly improve serving efficiency. Moreover, a few well-designed network layers and training strategies are proposed to further reduce computational cost while at the same time keep the performance as remarkable as BERT model. Lastly, we develop two versions of TwinBERT for retrieval and relevance tasks correspondingly, and both of them achieve close or on-par performance to BERT-Base model. The model was trained following the teacher-student framework and evaluated with data from one of the major search engines. Experimental results showed that the inference time was significantly reduced and was firstly controlled around 20ms on CPUs while at the same time the performance gain from fine-tuned BERT-Base model was mostly retained. Integration of the models into production systems also demonstrated remarkable improvements on relevance metrics with negligible influence on latency.
Tasks
Published	2020-02-14
URL	https://arxiv.org/abs/2002.06275v1
PDF	https://arxiv.org/pdf/2002.06275v1.pdf
PWC	https://paperswithcode.com/paper/twinbert-distilling-knowledge-to-twin
Repo
Framework

SlimConv: Reducing Channel Redundancy in Convolutional Neural Networks by Weights Flipping


Title	SlimConv: Reducing Channel Redundancy in Convolutional Neural Networks by Weights Flipping
Authors	Jiaxiong Qiu, Cai Chen, Shuaicheng Liu, Bing Zeng
Abstract	The channel redundancy in feature maps of convolutional neural networks (CNNs) results in the large consumption of memories and computational resources. In this work, we design a novel Slim Convolution (SlimConv) module to boost the performance of CNNs by reducing channel redundancies. Our SlimConv consists of three main steps: Reconstruct, Transform and Fuse, through which the features are splitted and reorganized in a more efficient way, such that the learned weights can be compressed effectively. In particular, the core of our model is a weight flipping operation which can largely improve the feature diversities, contributing to the performance crucially. Our SlimConv is a plug-and-play architectural unit which can be used to replace convolutional layers in CNNs directly. We validate the effectiveness of SlimConv by conducting comprehensive experiments on ImageNet, MS COCO2014, Pascal VOC2012 segmentation, and Pascal VOC2007 detection datasets. The experiments show that SlimConv-equipped models can achieve better performances consistently, less consumption of memory and computation resources than non-equipped conterparts. For example, the ResNet-101 fitted with SlimConv achieves 77.84% top-1 classification accuracy with 4.87 GFLOPs and 27.96M parameters on ImageNet, which shows almost 0.5% better performance with about 3 GFLOPs and 38% parameters reduced.
Tasks
Published	2020-03-16
URL	https://arxiv.org/abs/2003.07469v1
PDF	https://arxiv.org/pdf/2003.07469v1.pdf
PWC	https://paperswithcode.com/paper/slimconv-reducing-channel-redundancy-in
Repo
Framework

JPLink: On Linking Jobs to Vocational Interest Types


Title	JPLink: On Linking Jobs to Vocational Interest Types
Authors	Amila Silva, Pei-Chi Lo, Ee-Peng Lim
Abstract	Linking job seekers with relevant jobs requires matching based on not only skills, but also personality types. Although the Holland Code also known as RIASEC has frequently been used to group people by their suitability for six different categories of occupations, the RIASEC category labels of individual jobs are often not found in job posts. This is attributed to significant manual efforts required for assigning job posts with RIASEC labels. To cope with assigning massive number of jobs with RIASEC labels, we propose JPLink, a machine learning approach using the text content in job titles and job descriptions. JPLink exploits domain knowledge available in an occupation-specific knowledge base known as O*NET to improve feature representation of job posts. To incorporate relative ranking of RIASEC labels of each job, JPLink proposes a listwise loss function inspired by learning to rank. Both our quantitative and qualitative evaluations show that JPLink outperforms conventional baselines. We conduct an error analysis on JPLink’s predictions to show that it can uncover label errors in existing job posts.
Tasks	Learning-To-Rank
Published	2020-02-06
URL	https://arxiv.org/abs/2002.02557v1
PDF	https://arxiv.org/pdf/2002.02557v1.pdf
PWC	https://paperswithcode.com/paper/jplink-on-linking-jobs-to-vocational-interest
Repo
Framework

Rapid AI Development Cycle for the Coronavirus (COVID-19) Pandemic: Initial Results for Automated Detection & Patient Monitoring using Deep Learning CT Image Analysis


Title	Rapid AI Development Cycle for the Coronavirus (COVID-19) Pandemic: Initial Results for Automated Detection & Patient Monitoring using Deep Learning CT Image Analysis
Authors	Ophir Gozes, Maayan Frid-Adar, Hayit Greenspan, Patrick D. Browning, Huangqi Zhang, Wenbin Ji, Adam Bernheim, Eliot Siegel
Abstract	Purpose: Develop AI-based automated CT image analysis tools for detection, quantification, and tracking of Coronavirus; demonstrate they can differentiate coronavirus patients from non-patients. Materials and Methods: Multiple international datasets, including from Chinese disease-infected areas were included. We present a system that utilizes robust 2D and 3D deep learning models, modifying and adapting existing AI models and combining them with clinical understanding. We conducted multiple retrospective experiments to analyze the performance of the system in the detection of suspected COVID-19 thoracic CT features and to evaluate evolution of the disease in each patient over time using a 3D volume review, generating a Corona score. The study includes a testing set of 157 international patients (China and U.S). Results: Classification results for Coronavirus vs Non-coronavirus cases per thoracic CT studies were 0.996 AUC (95%CI: 0.989-1.00) ; on datasets of Chinese control and infected patients. Possible working point: 98.2% sensitivity, 92.2% specificity. For time analysis of Coronavirus patients, the system output enables quantitative measurements for smaller opacities (volume, diameter) and visualization of the larger opacities in a slice-based heat map or a 3D volume display. Our suggested Corona score measures the progression of disease over time. Conclusion: This initial study, which is currently being expanded to a larger population, demonstrated that rapidly developed AI-based image analysis can achieve high accuracy in detection of Coronavirus as well as quantification and tracking of disease burden.
Tasks
Published	2020-03-10
URL	https://arxiv.org/abs/2003.05037v3
PDF	https://arxiv.org/pdf/2003.05037v3.pdf
PWC	https://paperswithcode.com/paper/rapid-ai-development-cycle-for-the
Repo
Framework

A semi-supervised learning framework for quantitative structure-activity regression modelling


Title	A semi-supervised learning framework for quantitative structure-activity regression modelling
Authors	Oliver P Watson, Isidro Cortes-Ciriano, James A Watson
Abstract	Supervised learning models, also known as quantitative structure-activity regression (QSAR) models, are increasingly used in assisting the process of preclinical, small molecule drug discovery. The models are trained on data consisting of a finite dimensional representation of molecular structures and their corresponding target specific activities. These models can then be used to predict the activity of previously unmeasured novel compounds. In this work we address two problems related to this approach. The first is to estimate the extent to which the quality of the model predictions degrades for compounds very different from the compounds in the training data. The second is to adjust for the screening dependent selection bias inherent in many training data sets. In the most extreme cases, only compounds which pass an activity-dependent screening are reported. By using a semi-supervised learning framework, we show that it is possible to make predictions which take into account the similarity of the testing compounds to those in the training data and adjust for the reporting selection bias. We illustrate this approach using publicly available structure-activity data on a large set of compounds reported by GlaxoSmithKline (the Tres Cantos AntiMalarial Set) to inhibit in vitro P. falciparum growth.
Tasks	Drug Discovery
Published	2020-01-07
URL	https://arxiv.org/abs/2001.01924v1
PDF	https://arxiv.org/pdf/2001.01924v1.pdf
PWC	https://paperswithcode.com/paper/a-semi-supervised-learning-framework-for
Repo
Framework