February 1, 2020

3247 words 16 mins read

Paper Group AWR 176

User Diverse Preference Modeling by Multimodal Attentive Metric Learning. Byte-Pair Encoding for Text-to-SQL Generation. MIST: A Novel Training Strategy for Low-latencyScalable Neural Net Decoders. Querying Knowledge via Multi-Hop English Questions. Robot Localization in Floor Plans Using a Room Layout Edge Extraction Network. Unsupervised Discover …

User Diverse Preference Modeling by Multimodal Attentive Metric Learning


Title	User Diverse Preference Modeling by Multimodal Attentive Metric Learning
Authors	Fan Liu, Zhiyong Cheng, Changchang Sun, Yinglong Wang, Liqiang Nie, Mohan Kankanhalli
Abstract	Most existing recommender systems represent a user’s preference with a feature vector, which is assumed to be fixed when predicting this user’s preferences for different items. However, the same vector cannot accurately capture a user’s varying preferences on all items, especially when considering the diverse characteristics of various items. To tackle this problem, in this paper, we propose a novel Multimodal Attentive Metric Learning (MAML) method to model user diverse preferences for various items. In particular, for each user-item pair, we propose an attention neural network, which exploits the item’s multimodal features to estimate the user’s special attention to different aspects of this item. The obtained attention is then integrated into a metric-based learning method to predict the user preference on this item. The advantage of metric learning is that it can naturally overcome the problem of dot product similarity, which is adopted by matrix factorization (MF) based recommendation models but does not satisfy the triangle inequality property. In addition, it is worth mentioning that the attention mechanism cannot only help model user’s diverse preferences towards different items, but also overcome the geometrically restrictive problem caused by collaborative metric learning. Extensive experiments on large-scale real-world datasets show that our model can substantially outperform the state-of-the-art baselines, demonstrating the potential of modeling user diverse preference for recommendation.
Tasks	Metric Learning, Recommendation Systems
Published	2019-08-21
URL	https://arxiv.org/abs/1908.07738v1
PDF	https://arxiv.org/pdf/1908.07738v1.pdf
PWC	https://paperswithcode.com/paper/190807738
Repo	https://github.com/liufancs/MAML
Framework	tf

Byte-Pair Encoding for Text-to-SQL Generation


Title	Byte-Pair Encoding for Text-to-SQL Generation
Authors	Samuel Müller, Andreas Vlachos
Abstract	Neural sequence-to-sequence models provide a competitive approach to the task of mapping a question in natural language to an SQL query, also referred to as text-to-SQL generation. The Byte-Pair Encoding algorithm (BPE) has previously been used to improve machine translation (MT) between natural languages. In this work, we adapt BPE for text-to-SQL generation. As the datasets for this task are rather small compared to MT, we present a novel stopping criterion that prevents overfitting the BPE encoding to the training set. Additionally, we present AST BPE, which is a version of BPE that uses the Abstract Syntax Tree (AST) of the SQL statement to guide BPE merges and therefore produce BPE encodings that generalize better. We improved the accuracy of a strong attentive seq2seq baseline on five out of six English text-to-SQL tasks while reducing training time by more than 50% on four of them due to the shortened targets. Finally, on two of these tasks we exceeded previously reported accuracies.
Tasks	Machine Translation, Text-To-Sql
Published	2019-10-20
URL	https://arxiv.org/abs/1910.08962v2
PDF	https://arxiv.org/pdf/1910.08962v2.pdf
PWC	https://paperswithcode.com/paper/byte-pair-encoding-for-text-to-sql-generation
Repo	https://github.com/SamuelGabriel/sqlbpe
Framework	pytorch

MIST: A Novel Training Strategy for Low-latencyScalable Neural Net Decoders


Title	MIST: A Novel Training Strategy for Low-latencyScalable Neural Net Decoders
Authors	Kumar Yashashwi, Deepak Anand, Sibi Raj B Pillai, Prasanna Chaporkar, K Ganesh
Abstract	In this paper, we propose a low latency, robust and scalable neural net based decoder for convolutional and low-density parity-check (LPDC) coding schemes. The proposed decoders are demonstrated to have bit error rate (BER) and block error rate (BLER) performances at par with the state-of-the-art neural net based decoders while achieving more than 8 times higher decoding speed. The enhanced decoding speed is due to the use of convolutional neural network (CNN) as opposed to recurrent neural network (RNN) used in the best known neural net based decoders. This contradicts existing doctrine that only RNN based decoders can provide a performance close to the optimal ones. The key ingredient to our approach is a novel Mixed-SNR Independent Samples based Training (MIST), which allows for training of CNN with only 1% of possible datawords, even for block length as high as 1000. The proposed decoder is robust as, once trained, the same decoder can be used for a wide range of SNR values. Finally, in the presence of channel outages, the proposed decoders outperform the best known decoders, {\it viz.} unquantized Viterbi decoder for convolutional code, and belief propagation for LDPC. This gives the CNN decoder a significant advantage in 5G millimeter wave systems, where channel outages are prevalent.
Tasks
Published	2019-05-22
URL	https://arxiv.org/abs/1905.08990v1
PDF	https://arxiv.org/pdf/1905.08990v1.pdf
PWC	https://paperswithcode.com/paper/mist-a-novel-training-strategy-for-low
Repo	https://github.com/kryashashwi/MIST_CNN_Decoder
Framework	tf

Querying Knowledge via Multi-Hop English Questions


Title	Querying Knowledge via Multi-Hop English Questions
Authors	Tiantian Gao, Paul Fodor, Michael Kifer
Abstract	The inherent difficulty of knowledge specification and the lack of trained specialists are some of the key obstacles on the way to making intelligent systems based on the knowledge representation and reasoning (KRR) paradigm commonplace. Knowledge and query authoring using natural language, especially controlled natural language (CNL), is one of the promising approaches that could enable domain experts, who are not trained logicians, to both create formal knowledge and query it. In previous work, we introduced the KALM system (Knowledge Authoring Logic Machine) that supports knowledge authoring (and simple querying) with very high accuracy that at present is unachievable via machine learning approaches. The present paper expands on the question answering aspect of KALM and introduces KALM-QA (KALM for Question Answering) that is capable of answering much more complex English questions. We show that KALM-QA achieves 100% accuracy on an extensive suite of movie-related questions, called MetaQA, which contains almost 29,000 test questions and over 260,000 training questions. We contrast this with a published machine learning approach, which falls far short of this high mark.
Tasks	Question Answering
Published	2019-07-18
URL	https://arxiv.org/abs/1907.08176v1
PDF	https://arxiv.org/pdf/1907.08176v1.pdf
PWC	https://paperswithcode.com/paper/querying-knowledge-via-multi-hop-english
Repo	https://github.com/tiantiangao7/kalm-qa
Framework	none

Robot Localization in Floor Plans Using a Room Layout Edge Extraction Network


Title	Robot Localization in Floor Plans Using a Room Layout Edge Extraction Network
Authors	Federico Boniardi, Abhinav Valada, Rohit Mohan, Tim Caselitz, Wolfram Burgard
Abstract	Indoor localization is one of the crucial enablers for deployment of service robots. Although several successful techniques for indoor localization have been proposed, the majority of them relies on maps generated from data gathered with the same sensor modality used for localization. Typically, tedious labor by experts is needed to acquire this data, thus limiting the readiness of the system as well as its ease of installation for inexperienced operators. In this paper, we propose a memory and computationally efficient monocular camera-based localization system that allows a robot to estimate its pose given an architectural floor plan. Our method employs a convolutional neural network to predict room layout edges from a single camera image and estimates the robot pose using a particle filter that matches the extracted edges to the given floor plan. We evaluate our localization system using multiple real-world experiments and demonstrate that it has the robustness and accuracy required for reliable indoor navigation.
Tasks
Published	2019-03-05
URL	https://arxiv.org/abs/1903.01804v2
PDF	https://arxiv.org/pdf/1903.01804v2.pdf
PWC	https://paperswithcode.com/paper/robot-localization-in-floor-plans-using-a
Repo	https://github.com/ayusefi/Localization-Papers
Framework	none

Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents


Title	Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents
Authors	Jack Hessel, Lillian Lee, David Mimno
Abstract	Images and text co-occur constantly on the web, but explicit links between images and sentences (or other intra-document textual units) are often not present. We present algorithms that discover image-sentence relationships without relying on explicit multimodal annotation in training. We experiment on seven datasets of varying difficulty, ranging from documents consisting of groups of images captioned post hoc by crowdworkers to naturally-occurring user-generated multimodal documents. We find that a structured training objective based on identifying whether collections of images and sentences co-occur in documents can suffice to predict links between specific sentences and specific images within the same document at test time.
Tasks
Published	2019-04-16
URL	https://arxiv.org/abs/1904.07826v2
PDF	https://arxiv.org/pdf/1904.07826v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-discovery-of-multimodal-links-in
Repo	https://github.com/jmhessel/multi-retrieval
Framework	tf

Why Having 10,000 Parameters in Your Camera Model is Better Than Twelve


Title	Why Having 10,000 Parameters in Your Camera Model is Better Than Twelve
Authors	Thomas Schöps, Viktor Larsson, Marc Pollefeys, Torsten Sattler
Abstract	Camera calibration is an essential first step in setting up 3D Computer Vision systems. Commonly used parametric camera models are limited to a few degrees of freedom and thus often do not optimally fit to complex real lens distortion. In contrast, generic camera models allow for very accurate calibration due to their flexibility. Despite this, they have seen little use in practice. In this paper, we argue that this should change. We propose a calibration pipeline for generic models that is fully automated, easy to use, and can act as a drop-in replacement for parametric calibration, with a focus on accuracy. We compare our results to parametric calibrations. Considering stereo depth estimation and camera pose estimation as examples, we show that the calibration error acts as a bias on the results. We thus argue that in contrast to current common practice, generic models should be preferred over parametric ones whenever possible. To facilitate this, we released our calibration pipeline at https://github.com/puzzlepaint/camera_calibration, making both easy-to-use and accurate camera calibration available to everyone.
Tasks	Calibration, Depth Estimation, Pose Estimation, Stereo Depth Estimation
Published	2019-12-05
URL	https://arxiv.org/abs/1912.02908v2
PDF	https://arxiv.org/pdf/1912.02908v2.pdf
PWC	https://paperswithcode.com/paper/why-having-10000-parameters-in-your-camera
Repo	https://github.com/puzzlepaint/camera_calibration
Framework	none

PFLD: A Practical Facial Landmark Detector


Title	PFLD: A Practical Facial Landmark Detector
Authors	Xiaojie Guo, Siyuan Li, Jinke Yu, Jiawan Zhang, Jiayi Ma, Lin Ma, Wei Liu, Haibin Ling
Abstract	Being accurate, efficient, and compact is essential to a facial landmark detector for practical use. To simultaneously consider the three concerns, this paper investigates a neat model with promising detection accuracy under wild environments e.g., unconstrained pose, expression, lighting, and occlusion conditions) and super real-time speed on a mobile device. More concretely, we customize an end-to-end single stage network associated with acceleration techniques. During the training phase, for each sample, rotation information is estimated for geometrically regularizing landmark localization, which is then NOT involved in the testing phase. A novel loss is designed to, besides considering the geometrical regularization, mitigate the issue of data imbalance by adjusting weights of samples to different states, such as large pose, extreme lighting, and occlusion, in the training set. Extensive experiments are conducted to demonstrate the efficacy of our design and reveal its superior performance over state-of-the-art alternatives on widely-adopted challenging benchmarks, i.e., 300W (including iBUG, LFPW, AFW, HELEN, and XM2VTS) and AFLW. Our model can be merely 2.1Mb of size and reach over 140 fps per face on a mobile phone (Qualcomm ARM 845 processor) with high precision, making it attractive for large-scale or real-time applications. We have made our practical system based on PFLD 0.25X model publicly available at \url{http://sites.google.com/view/xjguo/fld} for encouraging comparisons and improvements from the community.
Tasks
Published	2019-02-28
URL	http://arxiv.org/abs/1902.10859v2
PDF	http://arxiv.org/pdf/1902.10859v2.pdf
PWC	https://paperswithcode.com/paper/pfld-a-practical-facial-landmark-detector
Repo	https://github.com/polarisZhao/PFLD-pytorch
Framework	pytorch

SEPT: Improving Scientific Named Entity Recognition with Span Representation


Title	SEPT: Improving Scientific Named Entity Recognition with Span Representation
Authors	Tan Yan, Heyan Huang, Xian-Ling Mao
Abstract	We introduce a new scientific named entity recognizer called SEPT, which stands for Span Extractor with Pre-trained Transformers. In recent papers, span extractors have been demonstrated to be a powerful model compared with sequence labeling models. However, we discover that with the development of pre-trained language models, the performance of span extractors appears to become similar to sequence labeling models. To keep the advantages of span representation, we modified the model by under-sampling to balance the positive and negative samples and reduce the search space. Furthermore, we simplify the origin network architecture to combine the span extractor with BERT. Experiments demonstrate that even simplified architecture achieves the same performance and SEPT achieves a new state of the art result in scientific named entity recognition even without relation information involved.
Tasks	Named Entity Recognition
Published	2019-11-08
URL	https://arxiv.org/abs/1911.03353v1
PDF	https://arxiv.org/pdf/1911.03353v1.pdf
PWC	https://paperswithcode.com/paper/sept-improving-scientific-named-entity
Repo	https://github.com/Ethan-yt/sept
Framework	pytorch

What Object Should I Use? - Task Driven Object Detection


Title	What Object Should I Use? - Task Driven Object Detection
Authors	Johann Sawatzky, Yaser Souri, Christian Grund, Juergen Gall
Abstract	When humans have to solve everyday tasks, they simply pick the objects that are most suitable. While the question which object should one use for a specific task sounds trivial for humans, it is very difficult to answer for robots or other autonomous systems. This issue, however, is not addressed by current benchmarks for object detection that focus on detecting object categories. We therefore introduce the COCO-Tasks dataset which comprises about 40,000 images where the most suitable objects for 14 tasks have been annotated. We furthermore propose an approach that detects the most suitable objects for a given task. The approach builds on a Gated Graph Neural Network to exploit the appearance of each object as well as the global context of all present objects in the scene. In our experiments, we show that the proposed approach outperforms other approaches that are evaluated on the dataset like classification or ranking approaches.
Tasks	Object Detection
Published	2019-04-05
URL	http://arxiv.org/abs/1904.03000v1
PDF	http://arxiv.org/pdf/1904.03000v1.pdf
PWC	https://paperswithcode.com/paper/what-object-should-i-use-task-driven-object
Repo	https://github.com/yassersouri/task-driven-object-detection
Framework	pytorch

GeNet: Deep Representations for Metagenomics


Title	GeNet: Deep Representations for Metagenomics
Authors	Mateo Rojas-Carulla, Ilya Tolstikhin, Guillermo Luque, Nicholas Youngblut, Ruth Ley, Bernhard Schölkopf
Abstract	We introduce GeNet, a method for shotgun metagenomic classification from raw DNA sequences that exploits the known hierarchical structure between labels for training. We provide a comparison with state-of-the-art methods Kraken and Centrifuge on datasets obtained from several sequencing technologies, in which dataset shift occurs. We show that GeNet obtains competitive precision and good recall, with orders of magnitude less memory requirements. Moreover, we show that a linear model trained on top of representations learned by GeNet achieves recall comparable to state-of-the-art methods on the aforementioned datasets, and achieves over 90% accuracy in a challenging pathogen detection problem. This provides evidence of the usefulness of the representations learned by GeNet for downstream biological tasks.
Tasks
Published	2019-01-30
URL	http://arxiv.org/abs/1901.11015v1
PDF	http://arxiv.org/pdf/1901.11015v1.pdf
PWC	https://paperswithcode.com/paper/genet-deep-representations-for-metagenomics
Repo	https://github.com/mrojascarulla/GeNet
Framework	tf

Discovering Reliable Correlations in Categorical Data


Title	Discovering Reliable Correlations in Categorical Data
Authors	Panagiotis Mandros, Mario Boley, Jilles Vreeken
Abstract	In many scientific tasks we are interested in discovering whether there exist any correlations in our data. This raises many questions, such as how to reliably and interpretably measure correlation between a multivariate set of attributes, how to do so without having to make assumptions on distribution of the data or the type of correlation, and, how to efficiently discover the top-most reliably correlated attribute sets from data. In this paper we answer these questions for discovery tasks in categorical data. In particular, we propose a corrected-for-chance, consistent, and efficient estimator for normalized total correlation, by which we obtain a reliable, naturally interpretable, non-parametric measure for correlation over multivariate sets. For the discovery of the top-k correlated sets, we derive an effective algorithmic framework based on a tight bounding function. This framework offers exact, approximate, and heuristic search. Empirical evaluation shows that already for small sample sizes the estimator leads to low-regret optimization outcomes, while the algorithms are shown to be highly effective for both large and high-dimensional data. Through two case studies we confirm that our discovery framework identifies interesting and meaningful correlations.
Tasks
Published	2019-08-30
URL	https://arxiv.org/abs/1908.11682v1
PDF	https://arxiv.org/pdf/1908.11682v1.pdf
PWC	https://paperswithcode.com/paper/discovering-reliable-correlations-in
Repo	https://github.com/pmandros/wodiscovery
Framework	none

Psi-Net: Shape and boundary aware joint multi-task deep network for medical image segmentation


Title	Psi-Net: Shape and boundary aware joint multi-task deep network for medical image segmentation
Authors	Balamurali Murugesan, Kaushik Sarveswaran, Sharath M Shankaranarayana, Keerthi Ram, Mohanasankar Sivaprakasam
Abstract	Image segmentation is a primary task in many medical applications. Recently, many deep networks derived from U-Net have been extensively used in various medical image segmentation tasks. However, in most of the cases, networks similar to U-net produce coarse and non-smooth segmentations with lots of discontinuities. To improve and refine the performance of U-Net like networks, we propose the use of parallel decoders which along with performing the mask predictions also perform contour prediction and distance map estimation. The contour and distance map aid in ensuring smoothness in the segmentation predictions. To facilitate joint training of three tasks, we propose a novel architecture called Psi-Net with a single encoder and three parallel decoders (thus having a shape of $\Psi$), one decoder to learns the segmentation mask prediction and other two decoders to learn the auxiliary tasks of contour detection and distance map estimation. The learning of these auxiliary tasks helps in capturing the shape and the boundary information. We also propose a new joint loss function for the proposed architecture. The loss function consists of a weighted combination of Negative Log likelihood and Mean Square Error loss. We have used two publicly available datasets: 1) Origa dataset for the task of optic cup and disc segmentation and 2) Endovis segment dataset for the task of polyp segmentation to evaluate our model. We have conducted extensive experiments using our network to show our model gives better results in terms of segmentation, boundary and shape metrics.
Tasks	Contour Detection, Medical Image Segmentation, Semantic Segmentation
Published	2019-02-11
URL	https://arxiv.org/abs/1902.04099v3
PDF	https://arxiv.org/pdf/1902.04099v3.pdf
PWC	https://paperswithcode.com/paper/psi-net-shape-and-boundary-aware-joint-multi
Repo	https://github.com/Bala93/Multi-task-deep-network
Framework	pytorch

Sliced Score Matching: A Scalable Approach to Density and Score Estimation


Title	Sliced Score Matching: A Scalable Approach to Density and Score Estimation
Authors	Yang Song, Sahaj Garg, Jiaxin Shi, Stefano Ermon
Abstract	Score matching is a popular method for estimating unnormalized statistical models. However, it has been so far limited to simple, shallow models or low-dimensional data, due to the difficulty of computing the Hessian of log-density functions. We show this difficulty can be mitigated by projecting the scores onto random vectors before comparing them. This objective, called sliced score matching, only involves Hessian-vector products, which can be easily implemented using reverse-mode automatic differentiation. Therefore, sliced score matching is amenable to more complex models and higher dimensional data compared to score matching. Theoretically, we prove the consistency and asymptotic normality of sliced score matching estimators. Moreover, we demonstrate that sliced score matching can be used to learn deep score estimators for implicit distributions. In our experiments, we show sliced score matching can learn deep energy-based models effectively, and can produce accurate score estimates for applications such as variational inference with implicit distributions and training Wasserstein Auto-Encoders.
Tasks
Published	2019-05-17
URL	https://arxiv.org/abs/1905.07088v2
PDF	https://arxiv.org/pdf/1905.07088v2.pdf
PWC	https://paperswithcode.com/paper/sliced-score-matching-a-scalable-approach-to
Repo	https://github.com/ermongroup/ncsn
Framework	pytorch

Fastened CROWN: Tightened Neural Network Robustness Certificates


Title	Fastened CROWN: Tightened Neural Network Robustness Certificates
Authors	Zhaoyang Lyu, Ching-Yun Ko, Zhifeng Kong, Ngai Wong, Dahua Lin, Luca Daniel
Abstract	The rapid growth of deep learning applications in real life is accompanied by severe safety concerns. To mitigate this uneasy phenomenon, much research has been done providing reliable evaluations of the fragility level in different deep neural networks. Apart from devising adversarial attacks, quantifiers that certify safeguarded regions have also been designed in the past five years. The summarizing work of Salman et al. unifies a family of existing verifiers under a convex relaxation framework. We draw inspiration from such work and further demonstrate the optimality of deterministic CROWN (Zhang et al. 2018) solutions in a given linear programming problem under mild constraints. Given this theoretical result, the computationally expensive linear programming based method is shown to be unnecessary. We then propose an optimization-based approach \textit{FROWN} (\textbf{F}astened C\textbf{ROWN}): a general algorithm to tighten robustness certificates for neural networks. Extensive experiments on various networks trained individually verify the effectiveness of FROWN in safeguarding larger robust regions.
Tasks
Published	2019-12-02
URL	https://arxiv.org/abs/1912.00574v1
PDF	https://arxiv.org/pdf/1912.00574v1.pdf
PWC	https://paperswithcode.com/paper/fastened-crown-tightened-neural-network
Repo	https://github.com/ZhaoyangLyu/FROWN
Framework	pytorch