Paper Group AWR 333
TechNet: Technology Semantic Network Based on Patent Data. Benchmark for Generic Product Detection: A Low Data Baseline for Dense Object Detection. Deep Session Interest Network for Click-Through Rate Prediction. CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB. Learning Trajectory Dependencies for Human Motion Prediction. NL …
TechNet: Technology Semantic Network Based on Patent Data
Title | TechNet: Technology Semantic Network Based on Patent Data |
Authors | Serhad Sarica, Jianxi Luo, Kristin L. Wood |
Abstract | The growing developments in general semantic networks, knowledge graphs and ontology databases have motivated us to build a large-scale comprehensive semantic network of technology-related data for engineering knowledge discovery, technology search and retrieval, and artificial intelligence for engineering design and innovation. Specially, we constructed a technology semantic network (TechNet) that covers the elemental concepts in all domains of technology and their semantic associations by mining the complete U.S. patent database from 1976. To derive the TechNet, natural language processing techniques were utilized to extract terms from massive patent texts and recent word embedding algorithms were employed to vectorize such terms and establish their semantic relationships. We report and evaluate the TechNet for retrieving terms and their pairwise relevance that is meaningful from a technology and engineering design perspective. The TechNet may serve as an infrastructure to support a wide range of applications, e.g., technical text summaries, search query predictions, relational knowledge discovery, and design ideation support, in the context of engineering and technology, and complement or enrich existing semantic databases. To enable such applications, the TechNet is made public via an online interface and APIs for public users to retrieve technology-related terms and their relevancies. |
Tasks | Knowledge Graphs, Semantic Textual Similarity |
Published | 2019-06-02 |
URL | https://arxiv.org/abs/1906.00411v4 |
https://arxiv.org/pdf/1906.00411v4.pdf | |
PWC | https://paperswithcode.com/paper/190600411 |
Repo | https://github.com/SerhadS/TechNet |
Framework | none |
Benchmark for Generic Product Detection: A Low Data Baseline for Dense Object Detection
Title | Benchmark for Generic Product Detection: A Low Data Baseline for Dense Object Detection |
Authors | Srikrishna Varadarajan, Sonaal Kant, Muktabh Mayank Srivastava |
Abstract | Object detection in densely packed scenes is a new area where standard object detectors fail to train well. Dense object detectors like RetinaNet trained on large and dense datasets show great performance. We train a standard object detector on a small, normally packed dataset with data augmentation techniques. This dataset is 265 times smaller than the standard dataset, in terms of number of annotations. This low data baseline achieves satisfactory results (mAP=0.56) at standard IoU of 0.5. We also create a varied benchmark for generic SKU product detection by providing full annotations for multiple public datasets. It can be accessed at https://github.com/ParallelDots/generic-sku-detection-benchmark. We hope that this benchmark helps in building robust detectors that perform reliably across different settings in the wild. |
Tasks | Data Augmentation, Dense Object Detection, Object Detection |
Published | 2019-12-19 |
URL | https://arxiv.org/abs/1912.09476v2 |
https://arxiv.org/pdf/1912.09476v2.pdf | |
PWC | https://paperswithcode.com/paper/benchmark-for-generic-product-detection-a |
Repo | https://github.com/ParallelDots/generic-sku-detection-benchmark |
Framework | none |
Deep Session Interest Network for Click-Through Rate Prediction
Title | Deep Session Interest Network for Click-Through Rate Prediction |
Authors | Yufei Feng, Fuyu Lv, Weichen Shen, Menghan Wang, Fei Sun, Yu Zhu, Keping Yang |
Abstract | Easy-to-use,Modular and Extendible package of deep-learning based CTR models.DeepFM,DeepInterestNetwork(DIN),DeepInterestEvolutionNetwork(DIEN),DeepCrossNetwork(DCN),AttentionalFactorizationMachine(AFM),Neural Factorization Machine(NFM),AutoInt,Deep Session Interest Network(DSIN) |
Tasks | Click-Through Rate Prediction, Recommendation Systems |
Published | 2019-05-16 |
URL | https://arxiv.org/abs/1905.06482v1 |
https://arxiv.org/pdf/1905.06482v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-session-interest-network-for-click |
Repo | https://github.com/shenweichen/DSIN |
Framework | tf |
CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB
Title | CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB |
Authors | Holger Schwenk, Guillaume Wenzek, Sergey Edunov, Edouard Grave, Armand Joulin |
Abstract | We show that margin-based bitext mining in a multilingual sentence space can be applied to monolingual corpora of billions of sentences. We are using ten snapshots of a curated common crawl corpus (Wenzek et al., 2019) totaling 32.7 billion unique sentences. Using one unified approach for 38 languages, we were able to mine 3.5 billions parallel sentences, out of which 661 million are aligned with English. 17 language pairs have more then 30 million parallel sentences, 82 more then 10 million, and most more than one million, including direct alignments between many European or Asian languages. To evaluate the quality of the mined bitexts, we train NMT systems for most of the language pairs and evaluate them on TED, WMT and WAT test sets. Using our mined bitexts only and no human translated parallel data, we achieve a new state-of-the-art for a single system on the WMT’19 test set for translation between English and German, Russian and Chinese, as well as German/French. In particular, our English/German system outperforms the best single one by close to 4 BLEU points and is almost on pair with best WMT’19 evaluation system which uses system combination and back-translation. We also achieve excellent results for distant languages pairs like Russian/Japanese, outperforming the best submission at the 2019 workshop on Asian Translation (WAT). |
Tasks | |
Published | 2019-11-10 |
URL | https://arxiv.org/abs/1911.04944v1 |
https://arxiv.org/pdf/1911.04944v1.pdf | |
PWC | https://paperswithcode.com/paper/ccmatrix-mining-billions-of-high-quality |
Repo | https://github.com/kmkwon94/ainize-laser |
Framework | pytorch |
Learning Trajectory Dependencies for Human Motion Prediction
Title | Learning Trajectory Dependencies for Human Motion Prediction |
Authors | Wei Mao, Miaomiao Liu, Mathieu Salzmann, Hongdong Li |
Abstract | Human motion prediction, i.e., forecasting future body poses given observed pose sequence, has typically been tackled with recurrent neural networks (RNNs). However, as evidenced by prior work, the resulted RNN models suffer from prediction errors accumulation, leading to undesired discontinuities in motion prediction. In this paper, we propose a simple feed-forward deep network for motion prediction, which takes into account both temporal smoothness and spatial dependencies among human body joints. In this context, we then propose to encode temporal information by working in trajectory space, instead of the traditionally-used pose space. This alleviates us from manually defining the range of temporal dependencies (or temporal convolutional filter size, as done in previous work). Moreover, spatial dependency of human pose is encoded by treating a human pose as a generic graph (rather than a human skeletal kinematic tree) formed by links between every pair of body joints. Instead of using a pre-defined graph structure, we design a new graph convolutional network to learn graph connectivity automatically. This allows the network to capture long range dependencies beyond that of human kinematic tree. We evaluate our approach on several standard benchmark datasets for motion prediction, including Human3.6M, the CMU motion capture dataset and 3DPW. Our experiments clearly demonstrate that the proposed approach achieves state of the art performance, and is applicable to both angle-based and position-based pose representations. The code is available at https://github.com/wei-mao-2019/LearnTrajDep |
Tasks | Human Pose Forecasting, Motion Capture, motion prediction |
Published | 2019-08-15 |
URL | https://arxiv.org/abs/1908.05436v2 |
https://arxiv.org/pdf/1908.05436v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-trajectory-dependencies-for-human |
Repo | https://github.com/wei-mao-2019/LearnTrajDep |
Framework | pytorch |
NL-FIIT at SemEval-2019 Task 9: Neural Model Ensemble for Suggestion Mining
Title | NL-FIIT at SemEval-2019 Task 9: Neural Model Ensemble for Suggestion Mining |
Authors | Samuel Pecar, Marian Simko, Maria Bielikova |
Abstract | In this paper, we present neural model architecture submitted to the SemEval-2019 Task 9 competition: “Suggestion Mining from Online Reviews and Forums”. We participated in both subtasks for domain specific and also cross-domain suggestion mining. We proposed a recurrent neural network architecture that employs Bi-LSTM layers and also self-attention mechanism. Our architecture tries to encode words via word representations using ELMo and ensembles multiple models to achieve better results. We performed experiments with different setups of our proposed model involving weighting of prediction classes for loss function. Our best model achieved in official test evaluation score of 0.6816 for subtask A and 0.6850 for subtask B. In official results, we achieved 12th and 10th place in subtasks A and B, respectively. |
Tasks | |
Published | 2019-04-05 |
URL | http://arxiv.org/abs/1904.02981v1 |
http://arxiv.org/pdf/1904.02981v1.pdf | |
PWC | https://paperswithcode.com/paper/nl-fiit-at-semeval-2019-task-9-neural-model |
Repo | https://github.com/SamuelPecar/NL-FIIT-SemEval19-Task9 |
Framework | pytorch |
Deep neural network or dermatologist?
Title | Deep neural network or dermatologist? |
Authors | Kyle Young, Gareth Booth, Becks Simpson, Reuben Dutton, Sally Shrapnel |
Abstract | Deep learning techniques have proven high accuracy for identifying melanoma in digitised dermoscopic images. A strength is that these methods are not constrained by features that are pre-defined by human semantics. A down-side is that it is difficult to understand the rationale of the model predictions and to identify potential failure modes. This is a major barrier to adoption of deep learning in clinical practice. In this paper we ask if two existing local interpretability methods, Grad-CAM and Kernel SHAP, can shed light on convolutional neural networks trained in the context of melanoma detection. Our contributions are (i) we first explore the domain space via a reproducible, end-to-end learning framework that creates a suite of 30 models, all trained on a publicly available data set (HAM10000), (ii) we next explore the reliability of GradCAM and Kernel SHAP in this context via some basic sanity check experiments (iii) finally, we investigate a random selection of models from our suite using GradCAM and Kernel SHAP. We show that despite high accuracy, the models will occasionally assign importance to features that are not relevant to the diagnostic task. We also show that models of similar accuracy will produce different explanations as measured by these methods. This work represents first steps in bridging the gap between model accuracy and interpretability in the domain of skin cancer classification. |
Tasks | Skin Cancer Classification |
Published | 2019-08-19 |
URL | https://arxiv.org/abs/1908.06612v1 |
https://arxiv.org/pdf/1908.06612v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-neural-network-or-dermatologist |
Repo | https://github.com/KyleYoung1997/DNNorDermatologist |
Framework | tf |
Enhanced free space detection in multiple lanes based on single CNN with scene identification
Title | Enhanced free space detection in multiple lanes based on single CNN with scene identification |
Authors | Fabio Pizzati, Fernando García |
Abstract | Many systems for autonomous vehicles’ navigation rely on lane detection. Traditional algorithms usually estimate only the position of the lanes on the road, but an autonomous control system may also need to know if a lane marking can be crossed or not, and what portion of space inside the lane is free from obstacles, to make safer control decisions. On the other hand, free space detection algorithms only detect navigable areas, without information about lanes. State-of-the-art algorithms use CNNs for both tasks, with significant consumption of computing resources. We propose a novel approach that estimates the free space inside each lane, with a single CNN. Additionally, adding only a small requirement concerning GPU RAM, we infer the road type, that will be useful for path planning. To achieve this result, we train a multi-task CNN. Then, we further elaborate the output of the network, to extract polygons that can be effectively used in navigation control. Finally, we provide a computationally efficient implementation, based on ROS, that can be executed in real time. Our code and trained models are available online. |
Tasks | Autonomous Vehicles, Lane Detection |
Published | 2019-05-02 |
URL | https://arxiv.org/abs/1905.00941v2 |
https://arxiv.org/pdf/1905.00941v2.pdf | |
PWC | https://paperswithcode.com/paper/enhanced-free-space-detection-in-multiple |
Repo | https://github.com/fabvio/ld-lsi |
Framework | pytorch |
BERTScore: Evaluating Text Generation with BERT
Title | BERTScore: Evaluating Text Generation with BERT |
Authors | Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, Yoav Artzi |
Abstract | We propose BERTScore, an automatic evaluation metric for text generation. Analogously to common metrics, BERTScore computes a similarity score for each token in the candidate sentence with each token in the reference sentence. However, instead of exact matches, we compute token similarity using contextual embeddings. We evaluate using the outputs of 363 machine translation and image captioning systems. BERTScore correlates better with human judgments and provides stronger model selection performance than existing metrics. Finally, we use an adversarial paraphrase detection task to show that BERTScore is more robust to challenging examples when compared to existing metrics. |
Tasks | Image Captioning, Machine Translation, Model Selection, Text Generation |
Published | 2019-04-21 |
URL | https://arxiv.org/abs/1904.09675v3 |
https://arxiv.org/pdf/1904.09675v3.pdf | |
PWC | https://paperswithcode.com/paper/bertscore-evaluating-text-generation-with |
Repo | https://github.com/Tiiiger/bert_score |
Framework | pytorch |
Progressive Face Super-Resolution via Attention to Facial Landmark
Title | Progressive Face Super-Resolution via Attention to Facial Landmark |
Authors | Deokyun Kim, Minseon Kim, Gihyun Kwon, Dae-Shik Kim |
Abstract | Face Super-Resolution (SR) is a subfield of the SR domain that specifically targets the reconstruction of face images. The main challenge of face SR is to restore essential facial features without distortion. We propose a novel face SR method that generates photo-realistic 8x super-resolved face images with fully retained facial details. To that end, we adopt a progressive training method, which allows stable training by splitting the network into successive steps, each producing output with a progressively higher resolution. We also propose a novel facial attention loss and apply it at each step to focus on restoring facial attributes in greater details by multiplying the pixel difference and heatmap values. Lastly, we propose a compressed version of the state-of-the-art face alignment network (FAN) for landmark heatmap extraction. With the proposed FAN, we can extract the heatmaps suitable for face SR and also reduce the overall training time. Experimental results verify that our method outperforms state-of-the-art methods in both qualitative and quantitative measurements, especially in perceptual quality. |
Tasks | Face Alignment, Super-Resolution |
Published | 2019-08-22 |
URL | https://arxiv.org/abs/1908.08239v1 |
https://arxiv.org/pdf/1908.08239v1.pdf | |
PWC | https://paperswithcode.com/paper/progressive-face-super-resolution-via |
Repo | https://github.com/DeokyunKim/Progressive-Face-Super-Resolution |
Framework | pytorch |
One Network for Multi-Domains: Domain Adaptive Hashing with Intersectant Generative Adversarial Network
Title | One Network for Multi-Domains: Domain Adaptive Hashing with Intersectant Generative Adversarial Network |
Authors | Tao He, Yuan-Fang Li, Lianli Gao, Dongxiang Zhang, Jingkuan Song |
Abstract | With the recent explosive increase of digital data, image recognition and retrieval become a critical practical application. Hashing is an effective solution to this problem, due to its low storage requirement and high query speed. However, most of past works focus on hashing in a single (source) domain. Thus, the learned hash function may not adapt well in a new (target) domain that has a large distributional difference with the source domain. In this paper, we explore an end-to-end domain adaptive learning framework that simultaneously and precisely generates discriminative hash codes and classifies target domain images. Our method encodes two domains images into a semantic common space, followed by two independent generative adversarial networks arming at crosswise reconstructing two domains’ images, reducing domain disparity and improving alignment in the shared space. We evaluate our framework on {four} public benchmark datasets, all of which show that our method is superior to the other state-of-the-art methods on the tasks of object recognition and image retrieval. |
Tasks | Image Retrieval, Object Recognition |
Published | 2019-07-01 |
URL | https://arxiv.org/abs/1907.00612v1 |
https://arxiv.org/pdf/1907.00612v1.pdf | |
PWC | https://paperswithcode.com/paper/one-network-for-multi-domains-domain-adaptive |
Repo | https://github.com/htlsn/igan |
Framework | none |
Dominant Set Clustering and Pooling for Multi-View 3D Object Recognition
Title | Dominant Set Clustering and Pooling for Multi-View 3D Object Recognition |
Authors | Chu Wang, Marcello Pelillo, Kaleem Siddiqi |
Abstract | View based strategies for 3D object recognition have proven to be very successful. The state-of-the-art methods now achieve over 90% correct category level recognition performance on appearance images. We improve upon these methods by introducing a view clustering and pooling layer based on dominant sets. The key idea is to pool information from views which are similar and thus belong to the same cluster. The pooled feature vectors are then fed as inputs to the same layer, in a recurrent fashion. This recurrent clustering and pooling module, when inserted in an off-the-shelf pretrained CNN, boosts performance for multi-view 3D object recognition, achieving a new state of the art test set recognition accuracy of 93.8% on the ModelNet 40 database. We also explore a fast approximate learning strategy for our cluster-pooling CNN, which, while sacrificing end-to-end learning, greatly improves its training efficiency with only a slight reduction of recognition accuracy to 93.3%. Our implementation is available at https://github.com/fate3439/dscnn. |
Tasks | 3D Object Recognition, Object Recognition |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01592v1 |
https://arxiv.org/pdf/1906.01592v1.pdf | |
PWC | https://paperswithcode.com/paper/dominant-set-clustering-and-pooling-for-multi |
Repo | https://github.com/fate3439/dscnn |
Framework | none |
Deep Constrained Dominant Sets for Person Re-identification
Title | Deep Constrained Dominant Sets for Person Re-identification |
Authors | Leulseged Tesfaye Alemu, Marcello Pelillo, Mubarak Shah |
Abstract | In this work, we propose an end-to-end constrained clustering scheme to tackle the person re-identification (re-id) problem. Deep neural networks (DNN) have recently proven to be effective on person re-identification task. In particular, rather than leveraging solely a probe-gallery similarity, diffusing the similarities among the gallery images in an end-to-end manner has proven to be effective in yielding a robust probe-gallery affinity. However, existing methods do not apply probe image as a constraint, and are prone to noise propagation during the similarity diffusion process. To overcome this, we propose an intriguing scheme which treats person-image retrieval problem as a {\em constrained clustering optimization} problem, called deep constrained dominant sets (DCDS). Given a probe and gallery images, we re-formulate person re-id problem as finding a constrained cluster, where the probe image is taken as a constraint (seed) and each cluster corresponds to a set of images corresponding to the same person. By optimizing the constrained clustering in an end-to-end manner, we naturally leverage the contextual knowledge of a set of images corresponding to the given person-images. We further enhance the performance by integrating an auxiliary net alongside DCDS, which employs a multi-scale Resnet. To validate the effectiveness of our method we present experiments on several benchmark datasets and show that the proposed method can outperform state-of-the-art methods. |
Tasks | Image Retrieval, Person Re-Identification |
Published | 2019-04-25 |
URL | https://arxiv.org/abs/1904.11397v2 |
https://arxiv.org/pdf/1904.11397v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-constrained-dominant-sets-for-person-re |
Repo | https://github.com/leule/DCDS |
Framework | pytorch |
Decoupling Representation and Classifier for Long-Tailed Recognition
Title | Decoupling Representation and Classifier for Long-Tailed Recognition |
Authors | Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, Yannis Kalantidis |
Abstract | The long-tail distribution of the visual world poses great challenges for deep learning based classification models on how to handle the class imbalance problem. Existing solutions usually involve class-balancing strategies, e.g., by loss re-weighting, data re-sampling, or transfer learning from head- to tail-classes, but most of them adhere to the scheme of jointly learning representations and classifiers. In this work, we decouple the learning procedure into representation learning and classification, and systematically explore how different balancing strategies affect them for long-tailed recognition. The findings are surprising: (1) data imbalance might not be an issue in learning high-quality representations; (2) with representations learned with the simplest instance-balanced (natural) sampling, it is also possible to achieve strong long-tailed recognition ability by adjusting only the classifier. We conduct extensive experiments and set new state-of-the-art performance on common long-tailed benchmarks like ImageNet-LT, Places-LT and iNaturalist, showing that it is possible to outperform carefully designed losses, sampling strategies, even complex modules with memory, by using a straightforward approach that decouples representation and classification. Our code is available at https://github.com/facebookresearch/classifier-balancing. |
Tasks | Representation Learning, Transfer Learning |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09217v2 |
https://arxiv.org/pdf/1910.09217v2.pdf | |
PWC | https://paperswithcode.com/paper/decoupling-representation-and-classifier-for |
Repo | https://github.com/facebookresearch/classifier-balancing |
Framework | pytorch |
Improving Textual Network Learning with Variational Homophilic Embeddings
Title | Improving Textual Network Learning with Variational Homophilic Embeddings |
Authors | Wenlin Wang, Chenyang Tao, Zhe Gan, Guoyin Wang, Liqun Chen, Xinyuan Zhang, Ruiyi Zhang, Qian Yang, Ricardo Henao, Lawrence Carin |
Abstract | The performance of many network learning applications crucially hinges on the success of network embedding algorithms, which aim to encode rich network information into low-dimensional vertex-based vector representations. This paper considers a novel variational formulation of network embeddings, with special focus on textual networks. Different from most existing methods that optimize a discriminative objective, we introduce Variational Homophilic Embedding (VHE), a fully generative model that learns network embeddings by modeling the semantic (textual) information with a variational autoencoder, while accounting for the structural (topology) information through a novel homophilic prior design. Homophilic vertex embeddings encourage similar embedding vectors for related (connected) vertices. The proposed VHE promises better generalization for downstream tasks, robustness to incomplete observations, and the ability to generalize to unseen vertices. Extensive experiments on real-world networks, for multiple tasks, demonstrate that the proposed method consistently achieves superior performance relative to competing state-of-the-art approaches. |
Tasks | Network Embedding |
Published | 2019-09-30 |
URL | https://arxiv.org/abs/1909.13456v1 |
https://arxiv.org/pdf/1909.13456v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-textual-network-learning-with |
Repo | https://github.com/Wenlin-Wang/VHE19 |
Framework | none |