Paper Group AWR 60
Cars Can’t Fly up in the Sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks. Extremely Dense Point Correspondences using a Learned Feature Descriptor. Leveraging Photogrammetric Mesh Models for Aerial-Ground Feature Point Matching Toward Integrated 3D Reconstruction. Detecting Attended Visual Targets in Video. High-Resolut …
Cars Can’t Fly up in the Sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks
Title | Cars Can’t Fly up in the Sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks |
Authors | Sungha Choi, Joanne T. Kim, Jaegul Choo |
Abstract | This paper exploits the intrinsic features of urban-scene images and proposes a general add-on module, called height-driven attention networks (HANet), for improving semantic segmentation for urban-scene images. It emphasizes informative features or classes selectively according to the vertical position of a pixel. The pixel-wise class distributions are significantly different from each other among horizontally segmented sections in the urban-scene images. Likewise, urban-scene images have their own distinct characteristics, but most semantic segmentation networks do not reflect such unique attributes in the architecture. The proposed network architecture incorporates the capability exploiting the attributes to handle the urban scene dataset effectively. We validate the consistent performance (mIoU) increase of various semantic segmentation models on two datasets when HANet is adopted. This extensive quantitative analysis demonstrates that adding our module to existing models is easy and cost-effective. Our method achieves a new state-of-the-art performance on the Cityscapes benchmark with a large margin among ResNet101 based segmentation models. Also, we show that the proposed model is coherent with the facts observed in the urban scene by visualizing and interpreting the attention map. |
Tasks | Scene Segmentation, Semantic Segmentation |
Published | 2020-03-11 |
URL | https://arxiv.org/abs/2003.05128v1 |
https://arxiv.org/pdf/2003.05128v1.pdf | |
PWC | https://paperswithcode.com/paper/cars-cant-fly-up-in-the-sky-improving-urban |
Repo | https://github.com/shachoi/HANet |
Framework | pytorch |
Extremely Dense Point Correspondences using a Learned Feature Descriptor
Title | Extremely Dense Point Correspondences using a Learned Feature Descriptor |
Authors | Xingtong Liu, Yiping Zheng, Benjamin Killeen, Masaru Ishii, Gregory D. Hager, Russell H. Taylor, Mathias Unberath |
Abstract | High-quality 3D reconstructions from endoscopy video play an important role in many clinical applications, including surgical navigation where they enable direct video-CT registration. While many methods exist for general multi-view 3D reconstruction, these methods often fail to deliver satisfactory performance on endoscopic video. Part of the reason is that local descriptors that establish pair-wise point correspondences, and thus drive reconstruction, struggle when confronted with the texture-scarce surface of anatomy. Learning-based dense descriptors usually have larger receptive fields enabling the encoding of global information, which can be used to disambiguate matches. In this work, we present an effective self-supervised training scheme and novel loss design for dense descriptor learning. In direct comparison to recent local and dense descriptors on an in-house sinus endoscopy dataset, we demonstrate that our proposed dense descriptor can generalize to unseen patients and scopes, thereby largely improving the performance of Structure from Motion (SfM) in terms of model density and completeness. We also evaluate our method on a public dense optical flow dataset and a small-scale SfM public dataset to further demonstrate the effectiveness and generality of our method. The source code is available at https://github.com/lppllppl920/DenseDescriptorLearning-Pytorch. |
Tasks | 3D Reconstruction, Optical Flow Estimation |
Published | 2020-03-02 |
URL | https://arxiv.org/abs/2003.00619v2 |
https://arxiv.org/pdf/2003.00619v2.pdf | |
PWC | https://paperswithcode.com/paper/extremely-dense-point-correspondences-using-a |
Repo | https://github.com/lppllppl920/DenseDescriptorLearning-Pytorch |
Framework | pytorch |
Leveraging Photogrammetric Mesh Models for Aerial-Ground Feature Point Matching Toward Integrated 3D Reconstruction
Title | Leveraging Photogrammetric Mesh Models for Aerial-Ground Feature Point Matching Toward Integrated 3D Reconstruction |
Authors | Qing Zhu, Zhendong Wang, Han Hu, Linfu Xie, Xuming Ge, Yeting Zhang |
Abstract | Integration of aerial and ground images has been proved as an efficient approach to enhance the surface reconstruction in urban environments. However, as the first step, the feature point matching between aerial and ground images is remarkably difficult, due to the large differences in viewpoint and illumination conditions. Previous studies based on geometry-aware image rectification have alleviated this problem, but the performance and convenience of this strategy is limited by several flaws, e.g. quadratic image pairs, segregated extraction of descriptors and occlusions. To address these problems, we propose a novel approach: leveraging photogrammetric mesh models for aerial-ground image matching. The methods of this proposed approach have linear time complexity with regard to the number of images, can explicitly handle low overlap using multi-view images and can be directly injected into off-the-shelf structure-from-motion (SfM) and multi-view stereo (MVS) solutions. First, aerial and ground images are reconstructed separately and initially co-registered through weak georeferencing data. Second, aerial models are rendered to the initial ground views, in which the color, depth and normal images are obtained. Then, the synthesized color images and the corresponding ground images are matched by comparing the descriptors, filtered by local geometrical information, and then propagated to the aerial views using depth images and patch-based matching. Experimental evaluations using various datasets confirm the superior performance of the proposed methods in aerial-ground image matching. In addition, incorporation of the existing SfM and MVS solutions into these methods enables more complete and accurate models to be directly obtained. |
Tasks | 3D Reconstruction |
Published | 2020-02-21 |
URL | https://arxiv.org/abs/2002.09085v1 |
https://arxiv.org/pdf/2002.09085v1.pdf | |
PWC | https://paperswithcode.com/paper/leveraging-photogrammetric-mesh-models-for |
Repo | https://github.com/saedrna/RenderMatch |
Framework | none |
Detecting Attended Visual Targets in Video
Title | Detecting Attended Visual Targets in Video |
Authors | Eunji Chong, Yongxin Wang, Nataniel Ruiz, James M. Rehg |
Abstract | We address the problem of detecting attention targets in video. Our goal is to identify where each person in each frame of a video is looking, and correctly handle the case where the gaze target is out-of-frame. Our novel architecture models the dynamic interaction between the scene and head features and infers time-varying attention targets. We introduce a new annotated dataset, VideoAttentionTarget, containing complex and dynamic patterns of real-world gaze behavior. Our experiments show that our model can effectively infer dynamic attention in videos. In addition, we apply our predicted attention maps to two social gaze behavior recognition tasks, and show that the resulting classifiers significantly outperform existing methods. We achieve state-of-the-art performance on three datasets: GazeFollow (static images), VideoAttentionTarget (videos), and VideoCoAtt (videos), and obtain the first results for automatically classifying clinically-relevant gaze behavior without wearable cameras or eye trackers. |
Tasks | Deep Attention |
Published | 2020-03-05 |
URL | https://arxiv.org/abs/2003.02501v2 |
https://arxiv.org/pdf/2003.02501v2.pdf | |
PWC | https://paperswithcode.com/paper/detecting-attended-visual-targets-in-video |
Repo | https://github.com/ejcgt/attention-target-detection |
Framework | pytorch |
High-Resolution Daytime Translation Without Domain Labels
Title | High-Resolution Daytime Translation Without Domain Labels |
Authors | Ivan Anokhin, Pavel Solovev, Denis Korzhenkov, Alexey Kharlamov, Taras Khakhulin, Alexey Silvestrov, Sergey Nikolenko, Victor Lempitsky, Gleb Sterkin |
Abstract | Modeling daytime changes in high resolution photographs, e.g., re-rendering the same scene under different illuminations typical for day, night, or dawn, is a challenging image manipulation task. We present the high-resolution daytime translation (HiDT) model for this task. HiDT combines a generative image-to-image model and a new upsampling scheme that allows to apply image translation at high resolution. The model demonstrates competitive results in terms of both commonly used GAN metrics and human evaluation. Importantly, this good performance comes as a result of training on a dataset of still landscape images with no daytime labels available. Our results are available at https://saic-mdal.github.io/HiDT/. |
Tasks | Image Super-Resolution, Image-to-Image Translation, Multimodal Unsupervised Image-To-Image Translation, Style Transfer, Unsupervised Image-To-Image Translation |
Published | 2020-03-19 |
URL | https://arxiv.org/abs/2003.08791v2 |
https://arxiv.org/pdf/2003.08791v2.pdf | |
PWC | https://paperswithcode.com/paper/high-resolution-daytime-translation-without |
Repo | https://github.com/saic-mdal/HiDT |
Framework | none |
Revisiting Challenges in Data-to-Text Generation with Fact Grounding
Title | Revisiting Challenges in Data-to-Text Generation with Fact Grounding |
Authors | Hongmin Wang |
Abstract | Data-to-text generation models face challenges in ensuring data fidelity by referring to the correct input source. To inspire studies in this area, Wiseman et al. (2017) introduced the RotoWire corpus on generating NBA game summaries from the box- and line-score tables. However, limited attempts have been made in this direction and the challenges remain. We observe a prominent bottleneck in the corpus where only about 60% of the summary contents can be grounded to the boxscore records. Such information deficiency tends to misguide a conditioned language model to produce unconditioned random facts and thus leads to factual hallucinations. In this work, we restore the information balance and revamp this task to focus on fact-grounded data-to-text generation. We introduce a purified and larger-scale dataset, RotoWire-FG (Fact-Grounding), with 50% more data from the year 2017-19 and enriched input tables, hoping to attract more research focuses in this direction. Moreover, we achieve improved data fidelity over the state-of-the-art models by integrating a new form of table reconstruction as an auxiliary task to boost the generation quality. |
Tasks | Data-to-Text Generation, Language Modelling, Text Generation |
Published | 2020-01-12 |
URL | https://arxiv.org/abs/2001.03830v1 |
https://arxiv.org/pdf/2001.03830v1.pdf | |
PWC | https://paperswithcode.com/paper/revisiting-challenges-in-data-to-text-1 |
Repo | https://github.com/wanghm92/rw_fg |
Framework | pytorch |
Debugging Machine Learning Pipelines
Title | Debugging Machine Learning Pipelines |
Authors | Raoni Lourenço, Juliana Freire, Dennis Shasha |
Abstract | Machine learning tasks entail the use of complex computational pipelines to reach quantitative and qualitative conclusions. If some of the activities in a pipeline produce erroneous or uninformative outputs, the pipeline may fail or produce incorrect results. Inferring the root cause of failures and unexpected behavior is challenging, usually requiring much human thought, and is both time-consuming and error-prone. We propose a new approach that makes use of iteration and provenance to automatically infer the root causes and derive succinct explanations of failures. Through a detailed experimental evaluation, we assess the cost, precision, and recall of our approach compared to the state of the art. Our source code and experimental data will be available for reproducibility and enhancement. |
Tasks | |
Published | 2020-02-11 |
URL | https://arxiv.org/abs/2002.04640v1 |
https://arxiv.org/pdf/2002.04640v1.pdf | |
PWC | https://paperswithcode.com/paper/debugging-machine-learning-pipelines |
Repo | https://github.com/raonilourenco/MLDebugger |
Framework | none |
Evaluating Weakly Supervised Object Localization Methods Right
Title | Evaluating Weakly Supervised Object Localization Methods Right |
Authors | Junsuk Choe, Seong Joon Oh, Seungho Lee, Sanghyuk Chun, Zeynep Akata, Hyunjung Shim |
Abstract | Weakly-supervised object localization (WSOL) has gained popularity over the last years for its promise to train localization models with only image-level labels. Since the seminal WSOL work of class activation mapping (CAM), the field has focused on how to expand the attention regions to cover objects more broadly and localize them better. However, these strategies rely on full localization supervision to validate hyperparameters and for model selection, which is in principle prohibited under the WSOL setup. In this paper, we argue that WSOL task is ill-posed with only image-level labels, and propose a new evaluation protocol where full supervision is limited to only a small held-out set not overlapping with the test set. We observe that, under our protocol, the five most recent WSOL methods have not made a major improvement over the CAM baseline. Moreover, we report that existing WSOL methods have not reached the few-shot learning baseline, where the full-supervision at validation time is used for model training instead. Based on our findings, we discuss some future directions for WSOL. |
Tasks | Few-Shot Learning, Model Selection, Object Localization, Weakly-Supervised Object Localization |
Published | 2020-01-21 |
URL | https://arxiv.org/abs/2001.07437v2 |
https://arxiv.org/pdf/2001.07437v2.pdf | |
PWC | https://paperswithcode.com/paper/evaluating-weakly-supervised-object |
Repo | https://github.com/clovaai/wsolevaluation |
Framework | pytorch |
Regression and Learning with Pixel-wise Attention for Retinal Fundus Glaucoma Segmentation and Detection
Title | Regression and Learning with Pixel-wise Attention for Retinal Fundus Glaucoma Segmentation and Detection |
Authors | Peng Liu, Ruogu Fang |
Abstract | Observing retinal fundus images by an ophthalmologist is a major diagnosis approach for glaucoma. However, it is still difficult to distinguish the features of the lesion solely through manual observations, especially, in glaucoma early phase. In this paper, we present two deep learning-based automated algorithms for glaucoma detection and optic disc and cup segmentation. We utilize the attention mechanism to learn pixel-wise features for accurate prediction. In particular, we present two convolutional neural networks that can focus on learning various pixel-wise level features. In addition, we develop several attention strategies to guide the networks to learn the important features that have a major impact on prediction accuracy. We evaluate our methods on the validation dataset and The proposed both tasks’ solutions can achieve impressive results and outperform current state-of-the-art methods. \textit{The code is available at \url{https://github.com/cswin/RLPA}}. |
Tasks | |
Published | 2020-01-06 |
URL | https://arxiv.org/abs/2001.01815v1 |
https://arxiv.org/pdf/2001.01815v1.pdf | |
PWC | https://paperswithcode.com/paper/regression-and-learning-with-pixel-wise |
Repo | https://github.com/cswin/RLPA |
Framework | none |
Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles
Title | Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles |
Authors | Malte Ostendorff, Terry Ruas, Moritz Schubotz, Georg Rehm, Bela Gipp |
Abstract | Many digital libraries recommend literature to their users considering the similarity between a query document and their repository. However, they often fail to distinguish what is the relationship that makes two documents alike. In this paper, we model the problem of finding the relationship between two documents as a pairwise document classification task. To find the semantic relation between documents, we apply a series of techniques, such as GloVe, Paragraph-Vectors, BERT, and XLNet under different configurations (e.g., sequence length, vector concatenation scheme), including a Siamese architecture for the Transformer-based systems. We perform our experiments on a newly proposed dataset of 32,168 Wikipedia article pairs and Wikidata properties that define the semantic document relations. Our results show vanilla BERT as the best performing system with an F1-score of 0.93, which we manually examine to better understand its applicability to other domains. Our findings suggest that classifying semantic relations between documents is a solvable task and motivates the development of recommender systems based on the evaluated techniques. The discussions in this paper serve as first steps in the exploration of documents through SPARQL-like queries such that one could find documents that are similar in one aspect but dissimilar in another. |
Tasks | Document Classification, Recommendation Systems |
Published | 2020-03-22 |
URL | https://arxiv.org/abs/2003.09881v1 |
https://arxiv.org/pdf/2003.09881v1.pdf | |
PWC | https://paperswithcode.com/paper/pairwise-multi-class-document-classification |
Repo | https://github.com/malteos/semantic-document-relations |
Framework | pytorch |
ZeroQ: A Novel Zero Shot Quantization Framework
Title | ZeroQ: A Novel Zero Shot Quantization Framework |
Authors | Yaohui Cai, Zhewei Yao, Zhen Dong, Amir Gholami, Michael W. Mahoney, Kurt Keutzer |
Abstract | Quantization is a promising approach for reducing the inference time and memory footprint of neural networks. However, most existing quantization methods require access to the original training dataset for retraining during quantization. This is often not possible for applications with sensitive or proprietary data, e.g., due to privacy and security concerns. Existing zero-shot quantization methods use different heuristics to address this, but they result in poor performance, especially when quantizing to ultra-low precision. Here, we propose ZeroQ , a novel zero-shot quantization framework to address this. ZeroQ enables mixed-precision quantization without any access to the training or validation data. This is achieved by optimizing for a Distilled Dataset, which is engineered to match the statistics of batch normalization across different layers of the network. ZeroQ supports both uniform and mixed-precision quantization. For the latter, we introduce a novel Pareto frontier based method to automatically determine the mixed-precision bit setting for all layers, with no manual search involved. We extensively test our proposed method on a diverse set of models, including ResNet18/50/152, MobileNetV2, ShuffleNet, SqueezeNext, and InceptionV3 on ImageNet, as well as RetinaNet-ResNet50 on the Microsoft COCO dataset. In particular, we show that ZeroQ can achieve 1.71% higher accuracy on MobileNetV2, as compared to the recently proposed DFQ method. Importantly, ZeroQ has a very low computational overhead, and it can finish the entire quantization process in less than 30s (0.5% of one epoch training time of ResNet50 on ImageNet). We have open-sourced the ZeroQ framework\footnote{https://github.com/amirgholami/ZeroQ}. |
Tasks | Quantization |
Published | 2020-01-01 |
URL | https://arxiv.org/abs/2001.00281v1 |
https://arxiv.org/pdf/2001.00281v1.pdf | |
PWC | https://paperswithcode.com/paper/zeroq-a-novel-zero-shot-quantization |
Repo | https://github.com/jakc4103/DFQ |
Framework | pytorch |
Decentralized Policy-Based Private Analytics
Title | Decentralized Policy-Based Private Analytics |
Authors | Kleomenis Katevas, Eugene Bagdasaryan, Jason Waterman, Mohamad Mounir Safadieh, Hamed Haddadi, Deborah Estrin |
Abstract | We are increasingly surrounded by applications, connected devices, services, and smart environments which require fine-grained access to various personal data. The inherent complexities of our personal and professional policies and preferences in interactions with these analytics services raise important challenges in privacy. Moreover, due to sensitivity of the data and regulatory and technical barriers, it is not always feasible to do these policy negotiations in a centralized manner. In this paper we present PoliBox, a decentralized, edge-based framework for policy-based personal data analytics. PoliBox brings together a number of existing established components to provide privacy-preserving analytics within a distributed setting. We evaluate our framework using a popular exemplar of private analytics, Federated Learning, and demonstrate that for varying model sizes and use cases, PoliBox is able to perform accurate model training and inference within very reasonable resource and time budgets. |
Tasks | |
Published | 2020-03-14 |
URL | https://arxiv.org/abs/2003.06612v1 |
https://arxiv.org/pdf/2003.06612v1.pdf | |
PWC | https://paperswithcode.com/paper/decentralized-policy-based-private-analytics |
Repo | https://github.com/minoskt/PoliBox |
Framework | none |
Closing the Dequantization Gap: PixelCNN as a Single-Layer Flow
Title | Closing the Dequantization Gap: PixelCNN as a Single-Layer Flow |
Authors | Didrik Nielsen, Ole Winther |
Abstract | Flow models have recently made great progress at modeling quantized sensor data such as images and audio. Due to the continuous nature of flow models, dequantization is typically applied when using them for such quantized data. In this paper, we propose subset flows, a class of flows which can tractably transform subsets of the input space in one pass. As a result, they can be applied directly to quantized data without the need for dequantization. Based on this class of flows, we present a novel interpretation of several existing autoregressive models, including WaveNet and PixelCNN, as single-layer flow models defined through an invertible transformation between uniform noise and data samples. This interpretation suggests that these existing models, 1) admit a latent representation of data and 2) can be stacked in multiple flow layers. We demonstrate this by exploring the latent space of a PixelCNN and by stacking PixelCNNs in multiple flow layers. |
Tasks | |
Published | 2020-02-06 |
URL | https://arxiv.org/abs/2002.02547v1 |
https://arxiv.org/pdf/2002.02547v1.pdf | |
PWC | https://paperswithcode.com/paper/closing-the-dequantization-gap-pixelcnn-as-a |
Repo | https://github.com/didriknielsen/pixelcnn_flow |
Framework | pytorch |
Learning by Semantic Similarity Makes Abstractive Summarization Better
Title | Learning by Semantic Similarity Makes Abstractive Summarization Better |
Authors | Wonjin Yoon, Yoon Sun Yeo, Minbyul Jeong, Bong-Jun Yi, Jaewoo Kang |
Abstract | One of the obstacles of abstractive summarization is the presence of various potentially correct predictions. Widely used objective functions for supervised learning, such as cross-entropy loss, cannot handle alternative answers effectively. Rather, they act as a training noise. In this paper, we propose Semantic Similarity strategy that can consider semantic meanings of generated summaries while training. Our training objective includes maximizing semantic similarity score which is calculated by an additional layer that estimates semantic similarity between generated summary and reference summary. By leveraging pre-trained language models, our model achieves a new state-of-the-art performance, ROUGE-L score of 41.5 on CNN/DM dataset. To support automatic evaluation, we also conducted human evaluation and received higher scores relative to both baseline and reference summaries. |
Tasks | Abstractive Text Summarization, Semantic Similarity, Semantic Textual Similarity |
Published | 2020-02-18 |
URL | https://arxiv.org/abs/2002.07767v1 |
https://arxiv.org/pdf/2002.07767v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-by-semantic-similarity-makes |
Repo | https://github.com/icml-2020-nlp/semsim |
Framework | pytorch |
Hybrid Semantic Recommender System for Chemical Compounds
Title | Hybrid Semantic Recommender System for Chemical Compounds |
Authors | Marcia Barros, André Moitinho, Francisco M. Couto |
Abstract | Recommending Chemical Compounds of interest to a particular researcher is a poorly explored field. The few existent datasets with information about the preferences of the researchers use implicit feedback. The lack of Recommender Systems in this particular field presents a challenge for the development of new recommendations models. In this work, we propose a Hybrid recommender model for recommending Chemical Compounds. The model integrates collaborative-filtering algorithms for implicit feedback (Alternating Least Squares (ALS) and Bayesian Personalized Ranking(BPR)) and semantic similarity between the Chemical Compounds in the ChEBI ontology (ONTO). We evaluated the model in an implicit dataset of Chemical Compounds, CheRM. The Hybrid model was able to improve the results of state-of-the-art collaborative-filtering algorithms, especially for Mean Reciprocal Rank, with an increase of 6.7% when comparing the collaborative-filtering ALS and the Hybrid ALS_ONTO. |
Tasks | Recommendation Systems, Semantic Similarity, Semantic Textual Similarity |
Published | 2020-01-21 |
URL | https://arxiv.org/abs/2001.07440v1 |
https://arxiv.org/pdf/2001.07440v1.pdf | |
PWC | https://paperswithcode.com/paper/hybrid-semantic-recommender-system-for |
Repo | https://github.com/lasigeBioTM/ChemRecSys |
Framework | none |