February 1, 2020

3177 words 15 mins read

Paper Group AWR 333

TechNet: Technology Semantic Network Based on Patent Data. Benchmark for Generic Product Detection: A Low Data Baseline for Dense Object Detection. Deep Session Interest Network for Click-Through Rate Prediction. CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB. Learning Trajectory Dependencies for Human Motion Prediction. NL …

TechNet: Technology Semantic Network Based on Patent Data


Title	TechNet: Technology Semantic Network Based on Patent Data
Authors	Serhad Sarica, Jianxi Luo, Kristin L. Wood
Abstract	The growing developments in general semantic networks, knowledge graphs and ontology databases have motivated us to build a large-scale comprehensive semantic network of technology-related data for engineering knowledge discovery, technology search and retrieval, and artificial intelligence for engineering design and innovation. Specially, we constructed a technology semantic network (TechNet) that covers the elemental concepts in all domains of technology and their semantic associations by mining the complete U.S. patent database from 1976. To derive the TechNet, natural language processing techniques were utilized to extract terms from massive patent texts and recent word embedding algorithms were employed to vectorize such terms and establish their semantic relationships. We report and evaluate the TechNet for retrieving terms and their pairwise relevance that is meaningful from a technology and engineering design perspective. The TechNet may serve as an infrastructure to support a wide range of applications, e.g., technical text summaries, search query predictions, relational knowledge discovery, and design ideation support, in the context of engineering and technology, and complement or enrich existing semantic databases. To enable such applications, the TechNet is made public via an online interface and APIs for public users to retrieve technology-related terms and their relevancies.
Tasks	Knowledge Graphs, Semantic Textual Similarity
Published	2019-06-02
URL	https://arxiv.org/abs/1906.00411v4
PDF	https://arxiv.org/pdf/1906.00411v4.pdf
PWC	https://paperswithcode.com/paper/190600411
Repo	https://github.com/SerhadS/TechNet
Framework	none

Benchmark for Generic Product Detection: A Low Data Baseline for Dense Object Detection


Title	Benchmark for Generic Product Detection: A Low Data Baseline for Dense Object Detection
Authors	Srikrishna Varadarajan, Sonaal Kant, Muktabh Mayank Srivastava
Abstract	Object detection in densely packed scenes is a new area where standard object detectors fail to train well. Dense object detectors like RetinaNet trained on large and dense datasets show great performance. We train a standard object detector on a small, normally packed dataset with data augmentation techniques. This dataset is 265 times smaller than the standard dataset, in terms of number of annotations. This low data baseline achieves satisfactory results (mAP=0.56) at standard IoU of 0.5. We also create a varied benchmark for generic SKU product detection by providing full annotations for multiple public datasets. It can be accessed at https://github.com/ParallelDots/generic-sku-detection-benchmark. We hope that this benchmark helps in building robust detectors that perform reliably across different settings in the wild.
Tasks	Data Augmentation, Dense Object Detection, Object Detection
Published	2019-12-19
URL	https://arxiv.org/abs/1912.09476v2
PDF	https://arxiv.org/pdf/1912.09476v2.pdf
PWC	https://paperswithcode.com/paper/benchmark-for-generic-product-detection-a
Repo	https://github.com/ParallelDots/generic-sku-detection-benchmark
Framework	none

Deep Session Interest Network for Click-Through Rate Prediction


Title	Deep Session Interest Network for Click-Through Rate Prediction
Authors	Yufei Feng, Fuyu Lv, Weichen Shen, Menghan Wang, Fei Sun, Yu Zhu, Keping Yang
Abstract	Easy-to-use,Modular and Extendible package of deep-learning based CTR models.DeepFM,DeepInterestNetwork(DIN),DeepInterestEvolutionNetwork(DIEN),DeepCrossNetwork(DCN),AttentionalFactorizationMachine(AFM),Neural Factorization Machine(NFM),AutoInt,Deep Session Interest Network(DSIN)
Tasks	Click-Through Rate Prediction, Recommendation Systems
Published	2019-05-16
URL	https://arxiv.org/abs/1905.06482v1
PDF	https://arxiv.org/pdf/1905.06482v1.pdf
PWC	https://paperswithcode.com/paper/deep-session-interest-network-for-click
Repo	https://github.com/shenweichen/DSIN
Framework	tf

CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB


Title	CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB
Authors	Holger Schwenk, Guillaume Wenzek, Sergey Edunov, Edouard Grave, Armand Joulin
Abstract	We show that margin-based bitext mining in a multilingual sentence space can be applied to monolingual corpora of billions of sentences. We are using ten snapshots of a curated common crawl corpus (Wenzek et al., 2019) totaling 32.7 billion unique sentences. Using one unified approach for 38 languages, we were able to mine 3.5 billions parallel sentences, out of which 661 million are aligned with English. 17 language pairs have more then 30 million parallel sentences, 82 more then 10 million, and most more than one million, including direct alignments between many European or Asian languages. To evaluate the quality of the mined bitexts, we train NMT systems for most of the language pairs and evaluate them on TED, WMT and WAT test sets. Using our mined bitexts only and no human translated parallel data, we achieve a new state-of-the-art for a single system on the WMT’19 test set for translation between English and German, Russian and Chinese, as well as German/French. In particular, our English/German system outperforms the best single one by close to 4 BLEU points and is almost on pair with best WMT’19 evaluation system which uses system combination and back-translation. We also achieve excellent results for distant languages pairs like Russian/Japanese, outperforming the best submission at the 2019 workshop on Asian Translation (WAT).
Tasks
Published	2019-11-10
URL	https://arxiv.org/abs/1911.04944v1
PDF	https://arxiv.org/pdf/1911.04944v1.pdf
PWC	https://paperswithcode.com/paper/ccmatrix-mining-billions-of-high-quality
Repo	https://github.com/kmkwon94/ainize-laser
Framework	pytorch

Learning Trajectory Dependencies for Human Motion Prediction


Title	Learning Trajectory Dependencies for Human Motion Prediction
Authors	Wei Mao, Miaomiao Liu, Mathieu Salzmann, Hongdong Li
Abstract	Human motion prediction, i.e., forecasting future body poses given observed pose sequence, has typically been tackled with recurrent neural networks (RNNs). However, as evidenced by prior work, the resulted RNN models suffer from prediction errors accumulation, leading to undesired discontinuities in motion prediction. In this paper, we propose a simple feed-forward deep network for motion prediction, which takes into account both temporal smoothness and spatial dependencies among human body joints. In this context, we then propose to encode temporal information by working in trajectory space, instead of the traditionally-used pose space. This alleviates us from manually defining the range of temporal dependencies (or temporal convolutional filter size, as done in previous work). Moreover, spatial dependency of human pose is encoded by treating a human pose as a generic graph (rather than a human skeletal kinematic tree) formed by links between every pair of body joints. Instead of using a pre-defined graph structure, we design a new graph convolutional network to learn graph connectivity automatically. This allows the network to capture long range dependencies beyond that of human kinematic tree. We evaluate our approach on several standard benchmark datasets for motion prediction, including Human3.6M, the CMU motion capture dataset and 3DPW. Our experiments clearly demonstrate that the proposed approach achieves state of the art performance, and is applicable to both angle-based and position-based pose representations. The code is available at https://github.com/wei-mao-2019/LearnTrajDep
Tasks	Human Pose Forecasting, Motion Capture, motion prediction
Published	2019-08-15
URL	https://arxiv.org/abs/1908.05436v2
PDF	https://arxiv.org/pdf/1908.05436v2.pdf
PWC	https://paperswithcode.com/paper/learning-trajectory-dependencies-for-human
Repo	https://github.com/wei-mao-2019/LearnTrajDep
Framework	pytorch

NL-FIIT at SemEval-2019 Task 9: Neural Model Ensemble for Suggestion Mining


Title	NL-FIIT at SemEval-2019 Task 9: Neural Model Ensemble for Suggestion Mining
Authors	Samuel Pecar, Marian Simko, Maria Bielikova
Abstract	In this paper, we present neural model architecture submitted to the SemEval-2019 Task 9 competition: “Suggestion Mining from Online Reviews and Forums”. We participated in both subtasks for domain specific and also cross-domain suggestion mining. We proposed a recurrent neural network architecture that employs Bi-LSTM layers and also self-attention mechanism. Our architecture tries to encode words via word representations using ELMo and ensembles multiple models to achieve better results. We performed experiments with different setups of our proposed model involving weighting of prediction classes for loss function. Our best model achieved in official test evaluation score of 0.6816 for subtask A and 0.6850 for subtask B. In official results, we achieved 12th and 10th place in subtasks A and B, respectively.
Tasks
Published	2019-04-05
URL	http://arxiv.org/abs/1904.02981v1
PDF	http://arxiv.org/pdf/1904.02981v1.pdf
PWC	https://paperswithcode.com/paper/nl-fiit-at-semeval-2019-task-9-neural-model
Repo	https://github.com/SamuelPecar/NL-FIIT-SemEval19-Task9
Framework	pytorch

Deep neural network or dermatologist?


Title	Deep neural network or dermatologist?
Authors	Kyle Young, Gareth Booth, Becks Simpson, Reuben Dutton, Sally Shrapnel
Abstract	Deep learning techniques have proven high accuracy for identifying melanoma in digitised dermoscopic images. A strength is that these methods are not constrained by features that are pre-defined by human semantics. A down-side is that it is difficult to understand the rationale of the model predictions and to identify potential failure modes. This is a major barrier to adoption of deep learning in clinical practice. In this paper we ask if two existing local interpretability methods, Grad-CAM and Kernel SHAP, can shed light on convolutional neural networks trained in the context of melanoma detection. Our contributions are (i) we first explore the domain space via a reproducible, end-to-end learning framework that creates a suite of 30 models, all trained on a publicly available data set (HAM10000), (ii) we next explore the reliability of GradCAM and Kernel SHAP in this context via some basic sanity check experiments (iii) finally, we investigate a random selection of models from our suite using GradCAM and Kernel SHAP. We show that despite high accuracy, the models will occasionally assign importance to features that are not relevant to the diagnostic task. We also show that models of similar accuracy will produce different explanations as measured by these methods. This work represents first steps in bridging the gap between model accuracy and interpretability in the domain of skin cancer classification.
Tasks	Skin Cancer Classification
Published	2019-08-19
URL	https://arxiv.org/abs/1908.06612v1
PDF	https://arxiv.org/pdf/1908.06612v1.pdf
PWC	https://paperswithcode.com/paper/deep-neural-network-or-dermatologist
Repo	https://github.com/KyleYoung1997/DNNorDermatologist
Framework	tf

Enhanced free space detection in multiple lanes based on single CNN with scene identification


Title	Enhanced free space detection in multiple lanes based on single CNN with scene identification
Authors	Fabio Pizzati, Fernando García
Abstract	Many systems for autonomous vehicles’ navigation rely on lane detection. Traditional algorithms usually estimate only the position of the lanes on the road, but an autonomous control system may also need to know if a lane marking can be crossed or not, and what portion of space inside the lane is free from obstacles, to make safer control decisions. On the other hand, free space detection algorithms only detect navigable areas, without information about lanes. State-of-the-art algorithms use CNNs for both tasks, with significant consumption of computing resources. We propose a novel approach that estimates the free space inside each lane, with a single CNN. Additionally, adding only a small requirement concerning GPU RAM, we infer the road type, that will be useful for path planning. To achieve this result, we train a multi-task CNN. Then, we further elaborate the output of the network, to extract polygons that can be effectively used in navigation control. Finally, we provide a computationally efficient implementation, based on ROS, that can be executed in real time. Our code and trained models are available online.
Tasks	Autonomous Vehicles, Lane Detection
Published	2019-05-02
URL	https://arxiv.org/abs/1905.00941v2
PDF	https://arxiv.org/pdf/1905.00941v2.pdf
PWC	https://paperswithcode.com/paper/enhanced-free-space-detection-in-multiple
Repo	https://github.com/fabvio/ld-lsi
Framework	pytorch

BERTScore: Evaluating Text Generation with BERT


Title	BERTScore: Evaluating Text Generation with BERT
Authors	Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, Yoav Artzi
Abstract	We propose BERTScore, an automatic evaluation metric for text generation. Analogously to common metrics, BERTScore computes a similarity score for each token in the candidate sentence with each token in the reference sentence. However, instead of exact matches, we compute token similarity using contextual embeddings. We evaluate using the outputs of 363 machine translation and image captioning systems. BERTScore correlates better with human judgments and provides stronger model selection performance than existing metrics. Finally, we use an adversarial paraphrase detection task to show that BERTScore is more robust to challenging examples when compared to existing metrics.
Tasks	Image Captioning, Machine Translation, Model Selection, Text Generation
Published	2019-04-21
URL	https://arxiv.org/abs/1904.09675v3
PDF	https://arxiv.org/pdf/1904.09675v3.pdf
PWC	https://paperswithcode.com/paper/bertscore-evaluating-text-generation-with
Repo	https://github.com/Tiiiger/bert_score
Framework	pytorch

Progressive Face Super-Resolution via Attention to Facial Landmark


Title	Progressive Face Super-Resolution via Attention to Facial Landmark
Authors	Deokyun Kim, Minseon Kim, Gihyun Kwon, Dae-Shik Kim
Abstract	Face Super-Resolution (SR) is a subfield of the SR domain that specifically targets the reconstruction of face images. The main challenge of face SR is to restore essential facial features without distortion. We propose a novel face SR method that generates photo-realistic 8x super-resolved face images with fully retained facial details. To that end, we adopt a progressive training method, which allows stable training by splitting the network into successive steps, each producing output with a progressively higher resolution. We also propose a novel facial attention loss and apply it at each step to focus on restoring facial attributes in greater details by multiplying the pixel difference and heatmap values. Lastly, we propose a compressed version of the state-of-the-art face alignment network (FAN) for landmark heatmap extraction. With the proposed FAN, we can extract the heatmaps suitable for face SR and also reduce the overall training time. Experimental results verify that our method outperforms state-of-the-art methods in both qualitative and quantitative measurements, especially in perceptual quality.
Tasks	Face Alignment, Super-Resolution
Published	2019-08-22
URL	https://arxiv.org/abs/1908.08239v1
PDF	https://arxiv.org/pdf/1908.08239v1.pdf
PWC	https://paperswithcode.com/paper/progressive-face-super-resolution-via
Repo	https://github.com/DeokyunKim/Progressive-Face-Super-Resolution
Framework	pytorch

One Network for Multi-Domains: Domain Adaptive Hashing with Intersectant Generative Adversarial Network


Title	One Network for Multi-Domains: Domain Adaptive Hashing with Intersectant Generative Adversarial Network
Authors	Tao He, Yuan-Fang Li, Lianli Gao, Dongxiang Zhang, Jingkuan Song
Abstract	With the recent explosive increase of digital data, image recognition and retrieval become a critical practical application. Hashing is an effective solution to this problem, due to its low storage requirement and high query speed. However, most of past works focus on hashing in a single (source) domain. Thus, the learned hash function may not adapt well in a new (target) domain that has a large distributional difference with the source domain. In this paper, we explore an end-to-end domain adaptive learning framework that simultaneously and precisely generates discriminative hash codes and classifies target domain images. Our method encodes two domains images into a semantic common space, followed by two independent generative adversarial networks arming at crosswise reconstructing two domains’ images, reducing domain disparity and improving alignment in the shared space. We evaluate our framework on {four} public benchmark datasets, all of which show that our method is superior to the other state-of-the-art methods on the tasks of object recognition and image retrieval.
Tasks	Image Retrieval, Object Recognition
Published	2019-07-01
URL	https://arxiv.org/abs/1907.00612v1
PDF	https://arxiv.org/pdf/1907.00612v1.pdf
PWC	https://paperswithcode.com/paper/one-network-for-multi-domains-domain-adaptive
Repo	https://github.com/htlsn/igan
Framework	none

Dominant Set Clustering and Pooling for Multi-View 3D Object Recognition


Title	Dominant Set Clustering and Pooling for Multi-View 3D Object Recognition
Authors	Chu Wang, Marcello Pelillo, Kaleem Siddiqi
Abstract	View based strategies for 3D object recognition have proven to be very successful. The state-of-the-art methods now achieve over 90% correct category level recognition performance on appearance images. We improve upon these methods by introducing a view clustering and pooling layer based on dominant sets. The key idea is to pool information from views which are similar and thus belong to the same cluster. The pooled feature vectors are then fed as inputs to the same layer, in a recurrent fashion. This recurrent clustering and pooling module, when inserted in an off-the-shelf pretrained CNN, boosts performance for multi-view 3D object recognition, achieving a new state of the art test set recognition accuracy of 93.8% on the ModelNet 40 database. We also explore a fast approximate learning strategy for our cluster-pooling CNN, which, while sacrificing end-to-end learning, greatly improves its training efficiency with only a slight reduction of recognition accuracy to 93.3%. Our implementation is available at https://github.com/fate3439/dscnn.
Tasks	3D Object Recognition, Object Recognition
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01592v1
PDF	https://arxiv.org/pdf/1906.01592v1.pdf
PWC	https://paperswithcode.com/paper/dominant-set-clustering-and-pooling-for-multi
Repo	https://github.com/fate3439/dscnn
Framework	none

Deep Constrained Dominant Sets for Person Re-identification


Title	Deep Constrained Dominant Sets for Person Re-identification
Authors	Leulseged Tesfaye Alemu, Marcello Pelillo, Mubarak Shah
Abstract	In this work, we propose an end-to-end constrained clustering scheme to tackle the person re-identification (re-id) problem. Deep neural networks (DNN) have recently proven to be effective on person re-identification task. In particular, rather than leveraging solely a probe-gallery similarity, diffusing the similarities among the gallery images in an end-to-end manner has proven to be effective in yielding a robust probe-gallery affinity. However, existing methods do not apply probe image as a constraint, and are prone to noise propagation during the similarity diffusion process. To overcome this, we propose an intriguing scheme which treats person-image retrieval problem as a {\em constrained clustering optimization} problem, called deep constrained dominant sets (DCDS). Given a probe and gallery images, we re-formulate person re-id problem as finding a constrained cluster, where the probe image is taken as a constraint (seed) and each cluster corresponds to a set of images corresponding to the same person. By optimizing the constrained clustering in an end-to-end manner, we naturally leverage the contextual knowledge of a set of images corresponding to the given person-images. We further enhance the performance by integrating an auxiliary net alongside DCDS, which employs a multi-scale Resnet. To validate the effectiveness of our method we present experiments on several benchmark datasets and show that the proposed method can outperform state-of-the-art methods.
Tasks	Image Retrieval, Person Re-Identification
Published	2019-04-25
URL	https://arxiv.org/abs/1904.11397v2
PDF	https://arxiv.org/pdf/1904.11397v2.pdf
PWC	https://paperswithcode.com/paper/deep-constrained-dominant-sets-for-person-re
Repo	https://github.com/leule/DCDS
Framework	pytorch

Decoupling Representation and Classifier for Long-Tailed Recognition


Title	Decoupling Representation and Classifier for Long-Tailed Recognition
Authors	Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, Yannis Kalantidis
Abstract	The long-tail distribution of the visual world poses great challenges for deep learning based classification models on how to handle the class imbalance problem. Existing solutions usually involve class-balancing strategies, e.g., by loss re-weighting, data re-sampling, or transfer learning from head- to tail-classes, but most of them adhere to the scheme of jointly learning representations and classifiers. In this work, we decouple the learning procedure into representation learning and classification, and systematically explore how different balancing strategies affect them for long-tailed recognition. The findings are surprising: (1) data imbalance might not be an issue in learning high-quality representations; (2) with representations learned with the simplest instance-balanced (natural) sampling, it is also possible to achieve strong long-tailed recognition ability by adjusting only the classifier. We conduct extensive experiments and set new state-of-the-art performance on common long-tailed benchmarks like ImageNet-LT, Places-LT and iNaturalist, showing that it is possible to outperform carefully designed losses, sampling strategies, even complex modules with memory, by using a straightforward approach that decouples representation and classification. Our code is available at https://github.com/facebookresearch/classifier-balancing.
Tasks	Representation Learning, Transfer Learning
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09217v2
PDF	https://arxiv.org/pdf/1910.09217v2.pdf
PWC	https://paperswithcode.com/paper/decoupling-representation-and-classifier-for
Repo	https://github.com/facebookresearch/classifier-balancing
Framework	pytorch

Improving Textual Network Learning with Variational Homophilic Embeddings


Title	Improving Textual Network Learning with Variational Homophilic Embeddings
Authors	Wenlin Wang, Chenyang Tao, Zhe Gan, Guoyin Wang, Liqun Chen, Xinyuan Zhang, Ruiyi Zhang, Qian Yang, Ricardo Henao, Lawrence Carin
Abstract	The performance of many network learning applications crucially hinges on the success of network embedding algorithms, which aim to encode rich network information into low-dimensional vertex-based vector representations. This paper considers a novel variational formulation of network embeddings, with special focus on textual networks. Different from most existing methods that optimize a discriminative objective, we introduce Variational Homophilic Embedding (VHE), a fully generative model that learns network embeddings by modeling the semantic (textual) information with a variational autoencoder, while accounting for the structural (topology) information through a novel homophilic prior design. Homophilic vertex embeddings encourage similar embedding vectors for related (connected) vertices. The proposed VHE promises better generalization for downstream tasks, robustness to incomplete observations, and the ability to generalize to unseen vertices. Extensive experiments on real-world networks, for multiple tasks, demonstrate that the proposed method consistently achieves superior performance relative to competing state-of-the-art approaches.
Tasks	Network Embedding
Published	2019-09-30
URL	https://arxiv.org/abs/1909.13456v1
PDF	https://arxiv.org/pdf/1909.13456v1.pdf
PWC	https://paperswithcode.com/paper/improving-textual-network-learning-with
Repo	https://github.com/Wenlin-Wang/VHE19
Framework	none