April 3, 2020

3183 words 15 mins read

Paper Group AWR 60

Cars Can’t Fly up in the Sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks. Extremely Dense Point Correspondences using a Learned Feature Descriptor. Leveraging Photogrammetric Mesh Models for Aerial-Ground Feature Point Matching Toward Integrated 3D Reconstruction. Detecting Attended Visual Targets in Video. High-Resolut …

Cars Can’t Fly up in the Sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks


Title	Cars Can’t Fly up in the Sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks
Authors	Sungha Choi, Joanne T. Kim, Jaegul Choo
Abstract	This paper exploits the intrinsic features of urban-scene images and proposes a general add-on module, called height-driven attention networks (HANet), for improving semantic segmentation for urban-scene images. It emphasizes informative features or classes selectively according to the vertical position of a pixel. The pixel-wise class distributions are significantly different from each other among horizontally segmented sections in the urban-scene images. Likewise, urban-scene images have their own distinct characteristics, but most semantic segmentation networks do not reflect such unique attributes in the architecture. The proposed network architecture incorporates the capability exploiting the attributes to handle the urban scene dataset effectively. We validate the consistent performance (mIoU) increase of various semantic segmentation models on two datasets when HANet is adopted. This extensive quantitative analysis demonstrates that adding our module to existing models is easy and cost-effective. Our method achieves a new state-of-the-art performance on the Cityscapes benchmark with a large margin among ResNet101 based segmentation models. Also, we show that the proposed model is coherent with the facts observed in the urban scene by visualizing and interpreting the attention map.
Tasks	Scene Segmentation, Semantic Segmentation
Published	2020-03-11
URL	https://arxiv.org/abs/2003.05128v1
PDF	https://arxiv.org/pdf/2003.05128v1.pdf
PWC	https://paperswithcode.com/paper/cars-cant-fly-up-in-the-sky-improving-urban
Repo	https://github.com/shachoi/HANet
Framework	pytorch

Extremely Dense Point Correspondences using a Learned Feature Descriptor


Title	Extremely Dense Point Correspondences using a Learned Feature Descriptor
Authors	Xingtong Liu, Yiping Zheng, Benjamin Killeen, Masaru Ishii, Gregory D. Hager, Russell H. Taylor, Mathias Unberath
Abstract	High-quality 3D reconstructions from endoscopy video play an important role in many clinical applications, including surgical navigation where they enable direct video-CT registration. While many methods exist for general multi-view 3D reconstruction, these methods often fail to deliver satisfactory performance on endoscopic video. Part of the reason is that local descriptors that establish pair-wise point correspondences, and thus drive reconstruction, struggle when confronted with the texture-scarce surface of anatomy. Learning-based dense descriptors usually have larger receptive fields enabling the encoding of global information, which can be used to disambiguate matches. In this work, we present an effective self-supervised training scheme and novel loss design for dense descriptor learning. In direct comparison to recent local and dense descriptors on an in-house sinus endoscopy dataset, we demonstrate that our proposed dense descriptor can generalize to unseen patients and scopes, thereby largely improving the performance of Structure from Motion (SfM) in terms of model density and completeness. We also evaluate our method on a public dense optical flow dataset and a small-scale SfM public dataset to further demonstrate the effectiveness and generality of our method. The source code is available at https://github.com/lppllppl920/DenseDescriptorLearning-Pytorch.
Tasks	3D Reconstruction, Optical Flow Estimation
Published	2020-03-02
URL	https://arxiv.org/abs/2003.00619v2
PDF	https://arxiv.org/pdf/2003.00619v2.pdf
PWC	https://paperswithcode.com/paper/extremely-dense-point-correspondences-using-a
Repo	https://github.com/lppllppl920/DenseDescriptorLearning-Pytorch
Framework	pytorch

Leveraging Photogrammetric Mesh Models for Aerial-Ground Feature Point Matching Toward Integrated 3D Reconstruction


Title	Leveraging Photogrammetric Mesh Models for Aerial-Ground Feature Point Matching Toward Integrated 3D Reconstruction
Authors	Qing Zhu, Zhendong Wang, Han Hu, Linfu Xie, Xuming Ge, Yeting Zhang
Abstract	Integration of aerial and ground images has been proved as an efficient approach to enhance the surface reconstruction in urban environments. However, as the first step, the feature point matching between aerial and ground images is remarkably difficult, due to the large differences in viewpoint and illumination conditions. Previous studies based on geometry-aware image rectification have alleviated this problem, but the performance and convenience of this strategy is limited by several flaws, e.g. quadratic image pairs, segregated extraction of descriptors and occlusions. To address these problems, we propose a novel approach: leveraging photogrammetric mesh models for aerial-ground image matching. The methods of this proposed approach have linear time complexity with regard to the number of images, can explicitly handle low overlap using multi-view images and can be directly injected into off-the-shelf structure-from-motion (SfM) and multi-view stereo (MVS) solutions. First, aerial and ground images are reconstructed separately and initially co-registered through weak georeferencing data. Second, aerial models are rendered to the initial ground views, in which the color, depth and normal images are obtained. Then, the synthesized color images and the corresponding ground images are matched by comparing the descriptors, filtered by local geometrical information, and then propagated to the aerial views using depth images and patch-based matching. Experimental evaluations using various datasets confirm the superior performance of the proposed methods in aerial-ground image matching. In addition, incorporation of the existing SfM and MVS solutions into these methods enables more complete and accurate models to be directly obtained.
Tasks	3D Reconstruction
Published	2020-02-21
URL	https://arxiv.org/abs/2002.09085v1
PDF	https://arxiv.org/pdf/2002.09085v1.pdf
PWC	https://paperswithcode.com/paper/leveraging-photogrammetric-mesh-models-for
Repo	https://github.com/saedrna/RenderMatch
Framework	none

Detecting Attended Visual Targets in Video


Title	Detecting Attended Visual Targets in Video
Authors	Eunji Chong, Yongxin Wang, Nataniel Ruiz, James M. Rehg
Abstract	We address the problem of detecting attention targets in video. Our goal is to identify where each person in each frame of a video is looking, and correctly handle the case where the gaze target is out-of-frame. Our novel architecture models the dynamic interaction between the scene and head features and infers time-varying attention targets. We introduce a new annotated dataset, VideoAttentionTarget, containing complex and dynamic patterns of real-world gaze behavior. Our experiments show that our model can effectively infer dynamic attention in videos. In addition, we apply our predicted attention maps to two social gaze behavior recognition tasks, and show that the resulting classifiers significantly outperform existing methods. We achieve state-of-the-art performance on three datasets: GazeFollow (static images), VideoAttentionTarget (videos), and VideoCoAtt (videos), and obtain the first results for automatically classifying clinically-relevant gaze behavior without wearable cameras or eye trackers.
Tasks	Deep Attention
Published	2020-03-05
URL	https://arxiv.org/abs/2003.02501v2
PDF	https://arxiv.org/pdf/2003.02501v2.pdf
PWC	https://paperswithcode.com/paper/detecting-attended-visual-targets-in-video
Repo	https://github.com/ejcgt/attention-target-detection
Framework	pytorch

High-Resolution Daytime Translation Without Domain Labels


Title	High-Resolution Daytime Translation Without Domain Labels
Authors	Ivan Anokhin, Pavel Solovev, Denis Korzhenkov, Alexey Kharlamov, Taras Khakhulin, Alexey Silvestrov, Sergey Nikolenko, Victor Lempitsky, Gleb Sterkin
Abstract	Modeling daytime changes in high resolution photographs, e.g., re-rendering the same scene under different illuminations typical for day, night, or dawn, is a challenging image manipulation task. We present the high-resolution daytime translation (HiDT) model for this task. HiDT combines a generative image-to-image model and a new upsampling scheme that allows to apply image translation at high resolution. The model demonstrates competitive results in terms of both commonly used GAN metrics and human evaluation. Importantly, this good performance comes as a result of training on a dataset of still landscape images with no daytime labels available. Our results are available at https://saic-mdal.github.io/HiDT/.
Tasks	Image Super-Resolution, Image-to-Image Translation, Multimodal Unsupervised Image-To-Image Translation, Style Transfer, Unsupervised Image-To-Image Translation
Published	2020-03-19
URL	https://arxiv.org/abs/2003.08791v2
PDF	https://arxiv.org/pdf/2003.08791v2.pdf
PWC	https://paperswithcode.com/paper/high-resolution-daytime-translation-without
Repo	https://github.com/saic-mdal/HiDT
Framework	none

Revisiting Challenges in Data-to-Text Generation with Fact Grounding


Title	Revisiting Challenges in Data-to-Text Generation with Fact Grounding
Authors	Hongmin Wang
Abstract	Data-to-text generation models face challenges in ensuring data fidelity by referring to the correct input source. To inspire studies in this area, Wiseman et al. (2017) introduced the RotoWire corpus on generating NBA game summaries from the box- and line-score tables. However, limited attempts have been made in this direction and the challenges remain. We observe a prominent bottleneck in the corpus where only about 60% of the summary contents can be grounded to the boxscore records. Such information deficiency tends to misguide a conditioned language model to produce unconditioned random facts and thus leads to factual hallucinations. In this work, we restore the information balance and revamp this task to focus on fact-grounded data-to-text generation. We introduce a purified and larger-scale dataset, RotoWire-FG (Fact-Grounding), with 50% more data from the year 2017-19 and enriched input tables, hoping to attract more research focuses in this direction. Moreover, we achieve improved data fidelity over the state-of-the-art models by integrating a new form of table reconstruction as an auxiliary task to boost the generation quality.
Tasks	Data-to-Text Generation, Language Modelling, Text Generation
Published	2020-01-12
URL	https://arxiv.org/abs/2001.03830v1
PDF	https://arxiv.org/pdf/2001.03830v1.pdf
PWC	https://paperswithcode.com/paper/revisiting-challenges-in-data-to-text-1
Repo	https://github.com/wanghm92/rw_fg
Framework	pytorch

Debugging Machine Learning Pipelines


Title	Debugging Machine Learning Pipelines
Authors	Raoni Lourenço, Juliana Freire, Dennis Shasha
Abstract	Machine learning tasks entail the use of complex computational pipelines to reach quantitative and qualitative conclusions. If some of the activities in a pipeline produce erroneous or uninformative outputs, the pipeline may fail or produce incorrect results. Inferring the root cause of failures and unexpected behavior is challenging, usually requiring much human thought, and is both time-consuming and error-prone. We propose a new approach that makes use of iteration and provenance to automatically infer the root causes and derive succinct explanations of failures. Through a detailed experimental evaluation, we assess the cost, precision, and recall of our approach compared to the state of the art. Our source code and experimental data will be available for reproducibility and enhancement.
Tasks
Published	2020-02-11
URL	https://arxiv.org/abs/2002.04640v1
PDF	https://arxiv.org/pdf/2002.04640v1.pdf
PWC	https://paperswithcode.com/paper/debugging-machine-learning-pipelines
Repo	https://github.com/raonilourenco/MLDebugger
Framework	none

Evaluating Weakly Supervised Object Localization Methods Right


Title	Evaluating Weakly Supervised Object Localization Methods Right
Authors	Junsuk Choe, Seong Joon Oh, Seungho Lee, Sanghyuk Chun, Zeynep Akata, Hyunjung Shim
Abstract	Weakly-supervised object localization (WSOL) has gained popularity over the last years for its promise to train localization models with only image-level labels. Since the seminal WSOL work of class activation mapping (CAM), the field has focused on how to expand the attention regions to cover objects more broadly and localize them better. However, these strategies rely on full localization supervision to validate hyperparameters and for model selection, which is in principle prohibited under the WSOL setup. In this paper, we argue that WSOL task is ill-posed with only image-level labels, and propose a new evaluation protocol where full supervision is limited to only a small held-out set not overlapping with the test set. We observe that, under our protocol, the five most recent WSOL methods have not made a major improvement over the CAM baseline. Moreover, we report that existing WSOL methods have not reached the few-shot learning baseline, where the full-supervision at validation time is used for model training instead. Based on our findings, we discuss some future directions for WSOL.
Tasks	Few-Shot Learning, Model Selection, Object Localization, Weakly-Supervised Object Localization
Published	2020-01-21
URL	https://arxiv.org/abs/2001.07437v2
PDF	https://arxiv.org/pdf/2001.07437v2.pdf
PWC	https://paperswithcode.com/paper/evaluating-weakly-supervised-object
Repo	https://github.com/clovaai/wsolevaluation
Framework	pytorch

Regression and Learning with Pixel-wise Attention for Retinal Fundus Glaucoma Segmentation and Detection


Title	Regression and Learning with Pixel-wise Attention for Retinal Fundus Glaucoma Segmentation and Detection
Authors	Peng Liu, Ruogu Fang
Abstract	Observing retinal fundus images by an ophthalmologist is a major diagnosis approach for glaucoma. However, it is still difficult to distinguish the features of the lesion solely through manual observations, especially, in glaucoma early phase. In this paper, we present two deep learning-based automated algorithms for glaucoma detection and optic disc and cup segmentation. We utilize the attention mechanism to learn pixel-wise features for accurate prediction. In particular, we present two convolutional neural networks that can focus on learning various pixel-wise level features. In addition, we develop several attention strategies to guide the networks to learn the important features that have a major impact on prediction accuracy. We evaluate our methods on the validation dataset and The proposed both tasks’ solutions can achieve impressive results and outperform current state-of-the-art methods. \textit{The code is available at \url{https://github.com/cswin/RLPA}}.
Tasks
Published	2020-01-06
URL	https://arxiv.org/abs/2001.01815v1
PDF	https://arxiv.org/pdf/2001.01815v1.pdf
PWC	https://paperswithcode.com/paper/regression-and-learning-with-pixel-wise
Repo	https://github.com/cswin/RLPA
Framework	none

Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles


Title	Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles
Authors	Malte Ostendorff, Terry Ruas, Moritz Schubotz, Georg Rehm, Bela Gipp
Abstract	Many digital libraries recommend literature to their users considering the similarity between a query document and their repository. However, they often fail to distinguish what is the relationship that makes two documents alike. In this paper, we model the problem of finding the relationship between two documents as a pairwise document classification task. To find the semantic relation between documents, we apply a series of techniques, such as GloVe, Paragraph-Vectors, BERT, and XLNet under different configurations (e.g., sequence length, vector concatenation scheme), including a Siamese architecture for the Transformer-based systems. We perform our experiments on a newly proposed dataset of 32,168 Wikipedia article pairs and Wikidata properties that define the semantic document relations. Our results show vanilla BERT as the best performing system with an F1-score of 0.93, which we manually examine to better understand its applicability to other domains. Our findings suggest that classifying semantic relations between documents is a solvable task and motivates the development of recommender systems based on the evaluated techniques. The discussions in this paper serve as first steps in the exploration of documents through SPARQL-like queries such that one could find documents that are similar in one aspect but dissimilar in another.
Tasks	Document Classification, Recommendation Systems
Published	2020-03-22
URL	https://arxiv.org/abs/2003.09881v1
PDF	https://arxiv.org/pdf/2003.09881v1.pdf
PWC	https://paperswithcode.com/paper/pairwise-multi-class-document-classification
Repo	https://github.com/malteos/semantic-document-relations
Framework	pytorch

ZeroQ: A Novel Zero Shot Quantization Framework


Title	ZeroQ: A Novel Zero Shot Quantization Framework
Authors	Yaohui Cai, Zhewei Yao, Zhen Dong, Amir Gholami, Michael W. Mahoney, Kurt Keutzer
Abstract	Quantization is a promising approach for reducing the inference time and memory footprint of neural networks. However, most existing quantization methods require access to the original training dataset for retraining during quantization. This is often not possible for applications with sensitive or proprietary data, e.g., due to privacy and security concerns. Existing zero-shot quantization methods use different heuristics to address this, but they result in poor performance, especially when quantizing to ultra-low precision. Here, we propose ZeroQ , a novel zero-shot quantization framework to address this. ZeroQ enables mixed-precision quantization without any access to the training or validation data. This is achieved by optimizing for a Distilled Dataset, which is engineered to match the statistics of batch normalization across different layers of the network. ZeroQ supports both uniform and mixed-precision quantization. For the latter, we introduce a novel Pareto frontier based method to automatically determine the mixed-precision bit setting for all layers, with no manual search involved. We extensively test our proposed method on a diverse set of models, including ResNet18/50/152, MobileNetV2, ShuffleNet, SqueezeNext, and InceptionV3 on ImageNet, as well as RetinaNet-ResNet50 on the Microsoft COCO dataset. In particular, we show that ZeroQ can achieve 1.71% higher accuracy on MobileNetV2, as compared to the recently proposed DFQ method. Importantly, ZeroQ has a very low computational overhead, and it can finish the entire quantization process in less than 30s (0.5% of one epoch training time of ResNet50 on ImageNet). We have open-sourced the ZeroQ framework\footnote{https://github.com/amirgholami/ZeroQ}.
Tasks	Quantization
Published	2020-01-01
URL	https://arxiv.org/abs/2001.00281v1
PDF	https://arxiv.org/pdf/2001.00281v1.pdf
PWC	https://paperswithcode.com/paper/zeroq-a-novel-zero-shot-quantization
Repo	https://github.com/jakc4103/DFQ
Framework	pytorch

Decentralized Policy-Based Private Analytics


Title	Decentralized Policy-Based Private Analytics
Authors	Kleomenis Katevas, Eugene Bagdasaryan, Jason Waterman, Mohamad Mounir Safadieh, Hamed Haddadi, Deborah Estrin
Abstract	We are increasingly surrounded by applications, connected devices, services, and smart environments which require fine-grained access to various personal data. The inherent complexities of our personal and professional policies and preferences in interactions with these analytics services raise important challenges in privacy. Moreover, due to sensitivity of the data and regulatory and technical barriers, it is not always feasible to do these policy negotiations in a centralized manner. In this paper we present PoliBox, a decentralized, edge-based framework for policy-based personal data analytics. PoliBox brings together a number of existing established components to provide privacy-preserving analytics within a distributed setting. We evaluate our framework using a popular exemplar of private analytics, Federated Learning, and demonstrate that for varying model sizes and use cases, PoliBox is able to perform accurate model training and inference within very reasonable resource and time budgets.
Tasks
Published	2020-03-14
URL	https://arxiv.org/abs/2003.06612v1
PDF	https://arxiv.org/pdf/2003.06612v1.pdf
PWC	https://paperswithcode.com/paper/decentralized-policy-based-private-analytics
Repo	https://github.com/minoskt/PoliBox
Framework	none

Closing the Dequantization Gap: PixelCNN as a Single-Layer Flow


Title	Closing the Dequantization Gap: PixelCNN as a Single-Layer Flow
Authors	Didrik Nielsen, Ole Winther
Abstract	Flow models have recently made great progress at modeling quantized sensor data such as images and audio. Due to the continuous nature of flow models, dequantization is typically applied when using them for such quantized data. In this paper, we propose subset flows, a class of flows which can tractably transform subsets of the input space in one pass. As a result, they can be applied directly to quantized data without the need for dequantization. Based on this class of flows, we present a novel interpretation of several existing autoregressive models, including WaveNet and PixelCNN, as single-layer flow models defined through an invertible transformation between uniform noise and data samples. This interpretation suggests that these existing models, 1) admit a latent representation of data and 2) can be stacked in multiple flow layers. We demonstrate this by exploring the latent space of a PixelCNN and by stacking PixelCNNs in multiple flow layers.
Tasks
Published	2020-02-06
URL	https://arxiv.org/abs/2002.02547v1
PDF	https://arxiv.org/pdf/2002.02547v1.pdf
PWC	https://paperswithcode.com/paper/closing-the-dequantization-gap-pixelcnn-as-a
Repo	https://github.com/didriknielsen/pixelcnn_flow
Framework	pytorch

Learning by Semantic Similarity Makes Abstractive Summarization Better


Title	Learning by Semantic Similarity Makes Abstractive Summarization Better
Authors	Wonjin Yoon, Yoon Sun Yeo, Minbyul Jeong, Bong-Jun Yi, Jaewoo Kang
Abstract	One of the obstacles of abstractive summarization is the presence of various potentially correct predictions. Widely used objective functions for supervised learning, such as cross-entropy loss, cannot handle alternative answers effectively. Rather, they act as a training noise. In this paper, we propose Semantic Similarity strategy that can consider semantic meanings of generated summaries while training. Our training objective includes maximizing semantic similarity score which is calculated by an additional layer that estimates semantic similarity between generated summary and reference summary. By leveraging pre-trained language models, our model achieves a new state-of-the-art performance, ROUGE-L score of 41.5 on CNN/DM dataset. To support automatic evaluation, we also conducted human evaluation and received higher scores relative to both baseline and reference summaries.
Tasks	Abstractive Text Summarization, Semantic Similarity, Semantic Textual Similarity
Published	2020-02-18
URL	https://arxiv.org/abs/2002.07767v1
PDF	https://arxiv.org/pdf/2002.07767v1.pdf
PWC	https://paperswithcode.com/paper/learning-by-semantic-similarity-makes
Repo	https://github.com/icml-2020-nlp/semsim
Framework	pytorch

Hybrid Semantic Recommender System for Chemical Compounds


Title	Hybrid Semantic Recommender System for Chemical Compounds
Authors	Marcia Barros, André Moitinho, Francisco M. Couto
Abstract	Recommending Chemical Compounds of interest to a particular researcher is a poorly explored field. The few existent datasets with information about the preferences of the researchers use implicit feedback. The lack of Recommender Systems in this particular field presents a challenge for the development of new recommendations models. In this work, we propose a Hybrid recommender model for recommending Chemical Compounds. The model integrates collaborative-filtering algorithms for implicit feedback (Alternating Least Squares (ALS) and Bayesian Personalized Ranking(BPR)) and semantic similarity between the Chemical Compounds in the ChEBI ontology (ONTO). We evaluated the model in an implicit dataset of Chemical Compounds, CheRM. The Hybrid model was able to improve the results of state-of-the-art collaborative-filtering algorithms, especially for Mean Reciprocal Rank, with an increase of 6.7% when comparing the collaborative-filtering ALS and the Hybrid ALS_ONTO.
Tasks	Recommendation Systems, Semantic Similarity, Semantic Textual Similarity
Published	2020-01-21
URL	https://arxiv.org/abs/2001.07440v1
PDF	https://arxiv.org/pdf/2001.07440v1.pdf
PWC	https://paperswithcode.com/paper/hybrid-semantic-recommender-system-for
Repo	https://github.com/lasigeBioTM/ChemRecSys
Framework	none