Paper Group AWR 176
User Diverse Preference Modeling by Multimodal Attentive Metric Learning. Byte-Pair Encoding for Text-to-SQL Generation. MIST: A Novel Training Strategy for Low-latencyScalable Neural Net Decoders. Querying Knowledge via Multi-Hop English Questions. Robot Localization in Floor Plans Using a Room Layout Edge Extraction Network. Unsupervised Discover …
User Diverse Preference Modeling by Multimodal Attentive Metric Learning
Title | User Diverse Preference Modeling by Multimodal Attentive Metric Learning |
Authors | Fan Liu, Zhiyong Cheng, Changchang Sun, Yinglong Wang, Liqiang Nie, Mohan Kankanhalli |
Abstract | Most existing recommender systems represent a user’s preference with a feature vector, which is assumed to be fixed when predicting this user’s preferences for different items. However, the same vector cannot accurately capture a user’s varying preferences on all items, especially when considering the diverse characteristics of various items. To tackle this problem, in this paper, we propose a novel Multimodal Attentive Metric Learning (MAML) method to model user diverse preferences for various items. In particular, for each user-item pair, we propose an attention neural network, which exploits the item’s multimodal features to estimate the user’s special attention to different aspects of this item. The obtained attention is then integrated into a metric-based learning method to predict the user preference on this item. The advantage of metric learning is that it can naturally overcome the problem of dot product similarity, which is adopted by matrix factorization (MF) based recommendation models but does not satisfy the triangle inequality property. In addition, it is worth mentioning that the attention mechanism cannot only help model user’s diverse preferences towards different items, but also overcome the geometrically restrictive problem caused by collaborative metric learning. Extensive experiments on large-scale real-world datasets show that our model can substantially outperform the state-of-the-art baselines, demonstrating the potential of modeling user diverse preference for recommendation. |
Tasks | Metric Learning, Recommendation Systems |
Published | 2019-08-21 |
URL | https://arxiv.org/abs/1908.07738v1 |
https://arxiv.org/pdf/1908.07738v1.pdf | |
PWC | https://paperswithcode.com/paper/190807738 |
Repo | https://github.com/liufancs/MAML |
Framework | tf |
Byte-Pair Encoding for Text-to-SQL Generation
Title | Byte-Pair Encoding for Text-to-SQL Generation |
Authors | Samuel Müller, Andreas Vlachos |
Abstract | Neural sequence-to-sequence models provide a competitive approach to the task of mapping a question in natural language to an SQL query, also referred to as text-to-SQL generation. The Byte-Pair Encoding algorithm (BPE) has previously been used to improve machine translation (MT) between natural languages. In this work, we adapt BPE for text-to-SQL generation. As the datasets for this task are rather small compared to MT, we present a novel stopping criterion that prevents overfitting the BPE encoding to the training set. Additionally, we present AST BPE, which is a version of BPE that uses the Abstract Syntax Tree (AST) of the SQL statement to guide BPE merges and therefore produce BPE encodings that generalize better. We improved the accuracy of a strong attentive seq2seq baseline on five out of six English text-to-SQL tasks while reducing training time by more than 50% on four of them due to the shortened targets. Finally, on two of these tasks we exceeded previously reported accuracies. |
Tasks | Machine Translation, Text-To-Sql |
Published | 2019-10-20 |
URL | https://arxiv.org/abs/1910.08962v2 |
https://arxiv.org/pdf/1910.08962v2.pdf | |
PWC | https://paperswithcode.com/paper/byte-pair-encoding-for-text-to-sql-generation |
Repo | https://github.com/SamuelGabriel/sqlbpe |
Framework | pytorch |
MIST: A Novel Training Strategy for Low-latencyScalable Neural Net Decoders
Title | MIST: A Novel Training Strategy for Low-latencyScalable Neural Net Decoders |
Authors | Kumar Yashashwi, Deepak Anand, Sibi Raj B Pillai, Prasanna Chaporkar, K Ganesh |
Abstract | In this paper, we propose a low latency, robust and scalable neural net based decoder for convolutional and low-density parity-check (LPDC) coding schemes. The proposed decoders are demonstrated to have bit error rate (BER) and block error rate (BLER) performances at par with the state-of-the-art neural net based decoders while achieving more than 8 times higher decoding speed. The enhanced decoding speed is due to the use of convolutional neural network (CNN) as opposed to recurrent neural network (RNN) used in the best known neural net based decoders. This contradicts existing doctrine that only RNN based decoders can provide a performance close to the optimal ones. The key ingredient to our approach is a novel Mixed-SNR Independent Samples based Training (MIST), which allows for training of CNN with only 1% of possible datawords, even for block length as high as 1000. The proposed decoder is robust as, once trained, the same decoder can be used for a wide range of SNR values. Finally, in the presence of channel outages, the proposed decoders outperform the best known decoders, {\it viz.} unquantized Viterbi decoder for convolutional code, and belief propagation for LDPC. This gives the CNN decoder a significant advantage in 5G millimeter wave systems, where channel outages are prevalent. |
Tasks | |
Published | 2019-05-22 |
URL | https://arxiv.org/abs/1905.08990v1 |
https://arxiv.org/pdf/1905.08990v1.pdf | |
PWC | https://paperswithcode.com/paper/mist-a-novel-training-strategy-for-low |
Repo | https://github.com/kryashashwi/MIST_CNN_Decoder |
Framework | tf |
Querying Knowledge via Multi-Hop English Questions
Title | Querying Knowledge via Multi-Hop English Questions |
Authors | Tiantian Gao, Paul Fodor, Michael Kifer |
Abstract | The inherent difficulty of knowledge specification and the lack of trained specialists are some of the key obstacles on the way to making intelligent systems based on the knowledge representation and reasoning (KRR) paradigm commonplace. Knowledge and query authoring using natural language, especially controlled natural language (CNL), is one of the promising approaches that could enable domain experts, who are not trained logicians, to both create formal knowledge and query it. In previous work, we introduced the KALM system (Knowledge Authoring Logic Machine) that supports knowledge authoring (and simple querying) with very high accuracy that at present is unachievable via machine learning approaches. The present paper expands on the question answering aspect of KALM and introduces KALM-QA (KALM for Question Answering) that is capable of answering much more complex English questions. We show that KALM-QA achieves 100% accuracy on an extensive suite of movie-related questions, called MetaQA, which contains almost 29,000 test questions and over 260,000 training questions. We contrast this with a published machine learning approach, which falls far short of this high mark. |
Tasks | Question Answering |
Published | 2019-07-18 |
URL | https://arxiv.org/abs/1907.08176v1 |
https://arxiv.org/pdf/1907.08176v1.pdf | |
PWC | https://paperswithcode.com/paper/querying-knowledge-via-multi-hop-english |
Repo | https://github.com/tiantiangao7/kalm-qa |
Framework | none |
Robot Localization in Floor Plans Using a Room Layout Edge Extraction Network
Title | Robot Localization in Floor Plans Using a Room Layout Edge Extraction Network |
Authors | Federico Boniardi, Abhinav Valada, Rohit Mohan, Tim Caselitz, Wolfram Burgard |
Abstract | Indoor localization is one of the crucial enablers for deployment of service robots. Although several successful techniques for indoor localization have been proposed, the majority of them relies on maps generated from data gathered with the same sensor modality used for localization. Typically, tedious labor by experts is needed to acquire this data, thus limiting the readiness of the system as well as its ease of installation for inexperienced operators. In this paper, we propose a memory and computationally efficient monocular camera-based localization system that allows a robot to estimate its pose given an architectural floor plan. Our method employs a convolutional neural network to predict room layout edges from a single camera image and estimates the robot pose using a particle filter that matches the extracted edges to the given floor plan. We evaluate our localization system using multiple real-world experiments and demonstrate that it has the robustness and accuracy required for reliable indoor navigation. |
Tasks | |
Published | 2019-03-05 |
URL | https://arxiv.org/abs/1903.01804v2 |
https://arxiv.org/pdf/1903.01804v2.pdf | |
PWC | https://paperswithcode.com/paper/robot-localization-in-floor-plans-using-a |
Repo | https://github.com/ayusefi/Localization-Papers |
Framework | none |
Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents
Title | Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents |
Authors | Jack Hessel, Lillian Lee, David Mimno |
Abstract | Images and text co-occur constantly on the web, but explicit links between images and sentences (or other intra-document textual units) are often not present. We present algorithms that discover image-sentence relationships without relying on explicit multimodal annotation in training. We experiment on seven datasets of varying difficulty, ranging from documents consisting of groups of images captioned post hoc by crowdworkers to naturally-occurring user-generated multimodal documents. We find that a structured training objective based on identifying whether collections of images and sentences co-occur in documents can suffice to predict links between specific sentences and specific images within the same document at test time. |
Tasks | |
Published | 2019-04-16 |
URL | https://arxiv.org/abs/1904.07826v2 |
https://arxiv.org/pdf/1904.07826v2.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-discovery-of-multimodal-links-in |
Repo | https://github.com/jmhessel/multi-retrieval |
Framework | tf |
Why Having 10,000 Parameters in Your Camera Model is Better Than Twelve
Title | Why Having 10,000 Parameters in Your Camera Model is Better Than Twelve |
Authors | Thomas Schöps, Viktor Larsson, Marc Pollefeys, Torsten Sattler |
Abstract | Camera calibration is an essential first step in setting up 3D Computer Vision systems. Commonly used parametric camera models are limited to a few degrees of freedom and thus often do not optimally fit to complex real lens distortion. In contrast, generic camera models allow for very accurate calibration due to their flexibility. Despite this, they have seen little use in practice. In this paper, we argue that this should change. We propose a calibration pipeline for generic models that is fully automated, easy to use, and can act as a drop-in replacement for parametric calibration, with a focus on accuracy. We compare our results to parametric calibrations. Considering stereo depth estimation and camera pose estimation as examples, we show that the calibration error acts as a bias on the results. We thus argue that in contrast to current common practice, generic models should be preferred over parametric ones whenever possible. To facilitate this, we released our calibration pipeline at https://github.com/puzzlepaint/camera_calibration, making both easy-to-use and accurate camera calibration available to everyone. |
Tasks | Calibration, Depth Estimation, Pose Estimation, Stereo Depth Estimation |
Published | 2019-12-05 |
URL | https://arxiv.org/abs/1912.02908v2 |
https://arxiv.org/pdf/1912.02908v2.pdf | |
PWC | https://paperswithcode.com/paper/why-having-10000-parameters-in-your-camera |
Repo | https://github.com/puzzlepaint/camera_calibration |
Framework | none |
PFLD: A Practical Facial Landmark Detector
Title | PFLD: A Practical Facial Landmark Detector |
Authors | Xiaojie Guo, Siyuan Li, Jinke Yu, Jiawan Zhang, Jiayi Ma, Lin Ma, Wei Liu, Haibin Ling |
Abstract | Being accurate, efficient, and compact is essential to a facial landmark detector for practical use. To simultaneously consider the three concerns, this paper investigates a neat model with promising detection accuracy under wild environments e.g., unconstrained pose, expression, lighting, and occlusion conditions) and super real-time speed on a mobile device. More concretely, we customize an end-to-end single stage network associated with acceleration techniques. During the training phase, for each sample, rotation information is estimated for geometrically regularizing landmark localization, which is then NOT involved in the testing phase. A novel loss is designed to, besides considering the geometrical regularization, mitigate the issue of data imbalance by adjusting weights of samples to different states, such as large pose, extreme lighting, and occlusion, in the training set. Extensive experiments are conducted to demonstrate the efficacy of our design and reveal its superior performance over state-of-the-art alternatives on widely-adopted challenging benchmarks, i.e., 300W (including iBUG, LFPW, AFW, HELEN, and XM2VTS) and AFLW. Our model can be merely 2.1Mb of size and reach over 140 fps per face on a mobile phone (Qualcomm ARM 845 processor) with high precision, making it attractive for large-scale or real-time applications. We have made our practical system based on PFLD 0.25X model publicly available at \url{http://sites.google.com/view/xjguo/fld} for encouraging comparisons and improvements from the community. |
Tasks | |
Published | 2019-02-28 |
URL | http://arxiv.org/abs/1902.10859v2 |
http://arxiv.org/pdf/1902.10859v2.pdf | |
PWC | https://paperswithcode.com/paper/pfld-a-practical-facial-landmark-detector |
Repo | https://github.com/polarisZhao/PFLD-pytorch |
Framework | pytorch |
SEPT: Improving Scientific Named Entity Recognition with Span Representation
Title | SEPT: Improving Scientific Named Entity Recognition with Span Representation |
Authors | Tan Yan, Heyan Huang, Xian-Ling Mao |
Abstract | We introduce a new scientific named entity recognizer called SEPT, which stands for Span Extractor with Pre-trained Transformers. In recent papers, span extractors have been demonstrated to be a powerful model compared with sequence labeling models. However, we discover that with the development of pre-trained language models, the performance of span extractors appears to become similar to sequence labeling models. To keep the advantages of span representation, we modified the model by under-sampling to balance the positive and negative samples and reduce the search space. Furthermore, we simplify the origin network architecture to combine the span extractor with BERT. Experiments demonstrate that even simplified architecture achieves the same performance and SEPT achieves a new state of the art result in scientific named entity recognition even without relation information involved. |
Tasks | Named Entity Recognition |
Published | 2019-11-08 |
URL | https://arxiv.org/abs/1911.03353v1 |
https://arxiv.org/pdf/1911.03353v1.pdf | |
PWC | https://paperswithcode.com/paper/sept-improving-scientific-named-entity |
Repo | https://github.com/Ethan-yt/sept |
Framework | pytorch |
What Object Should I Use? - Task Driven Object Detection
Title | What Object Should I Use? - Task Driven Object Detection |
Authors | Johann Sawatzky, Yaser Souri, Christian Grund, Juergen Gall |
Abstract | When humans have to solve everyday tasks, they simply pick the objects that are most suitable. While the question which object should one use for a specific task sounds trivial for humans, it is very difficult to answer for robots or other autonomous systems. This issue, however, is not addressed by current benchmarks for object detection that focus on detecting object categories. We therefore introduce the COCO-Tasks dataset which comprises about 40,000 images where the most suitable objects for 14 tasks have been annotated. We furthermore propose an approach that detects the most suitable objects for a given task. The approach builds on a Gated Graph Neural Network to exploit the appearance of each object as well as the global context of all present objects in the scene. In our experiments, we show that the proposed approach outperforms other approaches that are evaluated on the dataset like classification or ranking approaches. |
Tasks | Object Detection |
Published | 2019-04-05 |
URL | http://arxiv.org/abs/1904.03000v1 |
http://arxiv.org/pdf/1904.03000v1.pdf | |
PWC | https://paperswithcode.com/paper/what-object-should-i-use-task-driven-object |
Repo | https://github.com/yassersouri/task-driven-object-detection |
Framework | pytorch |
GeNet: Deep Representations for Metagenomics
Title | GeNet: Deep Representations for Metagenomics |
Authors | Mateo Rojas-Carulla, Ilya Tolstikhin, Guillermo Luque, Nicholas Youngblut, Ruth Ley, Bernhard Schölkopf |
Abstract | We introduce GeNet, a method for shotgun metagenomic classification from raw DNA sequences that exploits the known hierarchical structure between labels for training. We provide a comparison with state-of-the-art methods Kraken and Centrifuge on datasets obtained from several sequencing technologies, in which dataset shift occurs. We show that GeNet obtains competitive precision and good recall, with orders of magnitude less memory requirements. Moreover, we show that a linear model trained on top of representations learned by GeNet achieves recall comparable to state-of-the-art methods on the aforementioned datasets, and achieves over 90% accuracy in a challenging pathogen detection problem. This provides evidence of the usefulness of the representations learned by GeNet for downstream biological tasks. |
Tasks | |
Published | 2019-01-30 |
URL | http://arxiv.org/abs/1901.11015v1 |
http://arxiv.org/pdf/1901.11015v1.pdf | |
PWC | https://paperswithcode.com/paper/genet-deep-representations-for-metagenomics |
Repo | https://github.com/mrojascarulla/GeNet |
Framework | tf |
Discovering Reliable Correlations in Categorical Data
Title | Discovering Reliable Correlations in Categorical Data |
Authors | Panagiotis Mandros, Mario Boley, Jilles Vreeken |
Abstract | In many scientific tasks we are interested in discovering whether there exist any correlations in our data. This raises many questions, such as how to reliably and interpretably measure correlation between a multivariate set of attributes, how to do so without having to make assumptions on distribution of the data or the type of correlation, and, how to efficiently discover the top-most reliably correlated attribute sets from data. In this paper we answer these questions for discovery tasks in categorical data. In particular, we propose a corrected-for-chance, consistent, and efficient estimator for normalized total correlation, by which we obtain a reliable, naturally interpretable, non-parametric measure for correlation over multivariate sets. For the discovery of the top-k correlated sets, we derive an effective algorithmic framework based on a tight bounding function. This framework offers exact, approximate, and heuristic search. Empirical evaluation shows that already for small sample sizes the estimator leads to low-regret optimization outcomes, while the algorithms are shown to be highly effective for both large and high-dimensional data. Through two case studies we confirm that our discovery framework identifies interesting and meaningful correlations. |
Tasks | |
Published | 2019-08-30 |
URL | https://arxiv.org/abs/1908.11682v1 |
https://arxiv.org/pdf/1908.11682v1.pdf | |
PWC | https://paperswithcode.com/paper/discovering-reliable-correlations-in |
Repo | https://github.com/pmandros/wodiscovery |
Framework | none |
Psi-Net: Shape and boundary aware joint multi-task deep network for medical image segmentation
Title | Psi-Net: Shape and boundary aware joint multi-task deep network for medical image segmentation |
Authors | Balamurali Murugesan, Kaushik Sarveswaran, Sharath M Shankaranarayana, Keerthi Ram, Mohanasankar Sivaprakasam |
Abstract | Image segmentation is a primary task in many medical applications. Recently, many deep networks derived from U-Net have been extensively used in various medical image segmentation tasks. However, in most of the cases, networks similar to U-net produce coarse and non-smooth segmentations with lots of discontinuities. To improve and refine the performance of U-Net like networks, we propose the use of parallel decoders which along with performing the mask predictions also perform contour prediction and distance map estimation. The contour and distance map aid in ensuring smoothness in the segmentation predictions. To facilitate joint training of three tasks, we propose a novel architecture called Psi-Net with a single encoder and three parallel decoders (thus having a shape of $\Psi$), one decoder to learns the segmentation mask prediction and other two decoders to learn the auxiliary tasks of contour detection and distance map estimation. The learning of these auxiliary tasks helps in capturing the shape and the boundary information. We also propose a new joint loss function for the proposed architecture. The loss function consists of a weighted combination of Negative Log likelihood and Mean Square Error loss. We have used two publicly available datasets: 1) Origa dataset for the task of optic cup and disc segmentation and 2) Endovis segment dataset for the task of polyp segmentation to evaluate our model. We have conducted extensive experiments using our network to show our model gives better results in terms of segmentation, boundary and shape metrics. |
Tasks | Contour Detection, Medical Image Segmentation, Semantic Segmentation |
Published | 2019-02-11 |
URL | https://arxiv.org/abs/1902.04099v3 |
https://arxiv.org/pdf/1902.04099v3.pdf | |
PWC | https://paperswithcode.com/paper/psi-net-shape-and-boundary-aware-joint-multi |
Repo | https://github.com/Bala93/Multi-task-deep-network |
Framework | pytorch |
Sliced Score Matching: A Scalable Approach to Density and Score Estimation
Title | Sliced Score Matching: A Scalable Approach to Density and Score Estimation |
Authors | Yang Song, Sahaj Garg, Jiaxin Shi, Stefano Ermon |
Abstract | Score matching is a popular method for estimating unnormalized statistical models. However, it has been so far limited to simple, shallow models or low-dimensional data, due to the difficulty of computing the Hessian of log-density functions. We show this difficulty can be mitigated by projecting the scores onto random vectors before comparing them. This objective, called sliced score matching, only involves Hessian-vector products, which can be easily implemented using reverse-mode automatic differentiation. Therefore, sliced score matching is amenable to more complex models and higher dimensional data compared to score matching. Theoretically, we prove the consistency and asymptotic normality of sliced score matching estimators. Moreover, we demonstrate that sliced score matching can be used to learn deep score estimators for implicit distributions. In our experiments, we show sliced score matching can learn deep energy-based models effectively, and can produce accurate score estimates for applications such as variational inference with implicit distributions and training Wasserstein Auto-Encoders. |
Tasks | |
Published | 2019-05-17 |
URL | https://arxiv.org/abs/1905.07088v2 |
https://arxiv.org/pdf/1905.07088v2.pdf | |
PWC | https://paperswithcode.com/paper/sliced-score-matching-a-scalable-approach-to |
Repo | https://github.com/ermongroup/ncsn |
Framework | pytorch |
Fastened CROWN: Tightened Neural Network Robustness Certificates
Title | Fastened CROWN: Tightened Neural Network Robustness Certificates |
Authors | Zhaoyang Lyu, Ching-Yun Ko, Zhifeng Kong, Ngai Wong, Dahua Lin, Luca Daniel |
Abstract | The rapid growth of deep learning applications in real life is accompanied by severe safety concerns. To mitigate this uneasy phenomenon, much research has been done providing reliable evaluations of the fragility level in different deep neural networks. Apart from devising adversarial attacks, quantifiers that certify safeguarded regions have also been designed in the past five years. The summarizing work of Salman et al. unifies a family of existing verifiers under a convex relaxation framework. We draw inspiration from such work and further demonstrate the optimality of deterministic CROWN (Zhang et al. 2018) solutions in a given linear programming problem under mild constraints. Given this theoretical result, the computationally expensive linear programming based method is shown to be unnecessary. We then propose an optimization-based approach \textit{FROWN} (\textbf{F}astened C\textbf{ROWN}): a general algorithm to tighten robustness certificates for neural networks. Extensive experiments on various networks trained individually verify the effectiveness of FROWN in safeguarding larger robust regions. |
Tasks | |
Published | 2019-12-02 |
URL | https://arxiv.org/abs/1912.00574v1 |
https://arxiv.org/pdf/1912.00574v1.pdf | |
PWC | https://paperswithcode.com/paper/fastened-crown-tightened-neural-network |
Repo | https://github.com/ZhaoyangLyu/FROWN |
Framework | pytorch |