January 31, 2020

3050 words 15 mins read

Paper Group ANR 75

Paper Group ANR 75

Multi-scale Microaneurysms Segmentation Using Embedding Triplet Loss. Bayesian Evidential Deep Learning with PAC Regularization. Anomaly Detection with Joint Representation Learning of Content and Connection. Label-Agnostic Sequence Labeling by Copying Nearest Neighbors. 2nd Place and 2nd Place Solution to Kaggle Landmark Recognition andRetrieval C …

Multi-scale Microaneurysms Segmentation Using Embedding Triplet Loss

Title Multi-scale Microaneurysms Segmentation Using Embedding Triplet Loss
Authors Mhd Hasan Sarhan, Shadi Albarqouni, Mehmet Yigitsoy, Nassir Navab, Abouzar Eslami
Abstract Deep learning techniques are recently being used in fundus image analysis and diabetic retinopathy detection. Microaneurysms are an important indicator of diabetic retinopathy progression. We introduce a two-stage deep learning approach for microaneurysms segmentation using multiple scales of the input with selective sampling and embedding triplet loss. The model first segments on two scales and then the segmentations are refined with a classification model. To enhance the discriminative power of the classification model, we incorporate triplet embedding loss with a selective sampling routine. The model is evaluated quantitatively to assess the segmentation performance and qualitatively to analyze the model predictions. This approach introduces a 30.29% relative improvement over the fully convolutional neural network.
Tasks Diabetic Retinopathy Detection
Published 2019-04-18
URL https://arxiv.org/abs/1904.12732v2
PDF https://arxiv.org/pdf/1904.12732v2.pdf
PWC https://paperswithcode.com/paper/190412732
Repo
Framework

Bayesian Evidential Deep Learning with PAC Regularization

Title Bayesian Evidential Deep Learning with PAC Regularization
Authors Manuel Haussmann, Sebastian Gerwinn, Melih Kandemir
Abstract We propose a novel method for closed-form predictive distribution modeling with neural nets. In quantifying prediction uncertainty, we build on Evidential Deep Learning (EDL), which has been impactful as being both simple to implement and giving closed-form access to predictive uncertainty. We employ EDL to model aleatoric uncertainty and extend it to account also for epistemic uncertainty by converting it to a Bayesian Neural Net (BNN). While extending its uncertainty quantification capabilities, we maintain its analytically accessible predictive distribution model by performing progressive moment matching for the first time for approximate weight marginalization. The eventual model introduces a prohibitively large number of hyperparameters for stable training. We overcome this drawback by deriving a vacuous PAC bound that comprises the marginal likelihood of the predictor and a complexity penalty. We observe on regression, classification, and out-of-domain detection benchmarks that our method improves model fit and uncertainty quantification.
Tasks
Published 2019-06-03
URL https://arxiv.org/abs/1906.00816v2
PDF https://arxiv.org/pdf/1906.00816v2.pdf
PWC https://paperswithcode.com/paper/190600816
Repo
Framework

Anomaly Detection with Joint Representation Learning of Content and Connection

Title Anomaly Detection with Joint Representation Learning of Content and Connection
Authors Junhao Wang, Renhao Wang, Aayushi Kulshrestha, Reihaneh Rabbany
Abstract Social media sites are becoming a key factor in politics. These platforms are easy to manipulate for the purpose of distorting information space to confuse and distract voters. Past works to identify disruptive patterns are mostly focused on analyzing the content of tweets. In this study, we jointly embed the information from both user posted content as well as a user’s follower network, to detect groups of densely connected users in an unsupervised fashion. We then investigate these dense sub-blocks of users to flag anomalous behavior. In our experiments, we study the tweets related to the upcoming 2019 Canadian Elections, and observe a set of densely-connected users engaging in local politics in different provinces, and exhibiting troll-like behavior.
Tasks Anomaly Detection, Representation Learning
Published 2019-06-16
URL https://arxiv.org/abs/1906.12328v1
PDF https://arxiv.org/pdf/1906.12328v1.pdf
PWC https://paperswithcode.com/paper/anomaly-detection-with-joint-representation
Repo
Framework

Label-Agnostic Sequence Labeling by Copying Nearest Neighbors

Title Label-Agnostic Sequence Labeling by Copying Nearest Neighbors
Authors Sam Wiseman, Karl Stratos
Abstract Retrieve-and-edit based approaches to structured prediction, where structures associated with retrieved neighbors are edited to form new structures, have recently attracted increased interest. However, much recent work merely conditions on retrieved structures (e.g., in a sequence-to-sequence framework), rather than explicitly manipulating them. We show we can perform accurate sequence labeling by explicitly (and only) copying labels from retrieved neighbors. Moreover, because this copying is label-agnostic, we can achieve impressive performance in zero-shot sequence-labeling tasks. We additionally consider a dynamic programming approach to sequence labeling in the presence of retrieved neighbors, which allows for controlling the number of distinct (copied) segments used to form a prediction, and leads to both more interpretable and accurate predictions.
Tasks Structured Prediction
Published 2019-06-10
URL https://arxiv.org/abs/1906.04225v1
PDF https://arxiv.org/pdf/1906.04225v1.pdf
PWC https://paperswithcode.com/paper/label-agnostic-sequence-labeling-by-copying
Repo
Framework

2nd Place and 2nd Place Solution to Kaggle Landmark Recognition andRetrieval Competition 2019

Title 2nd Place and 2nd Place Solution to Kaggle Landmark Recognition andRetrieval Competition 2019
Authors Kaibing Chen, Cheng Cui, Yuning Du, Xianglong Meng, Hui Ren
Abstract We present a retrieval based system for landmark retrieval and recognition challenge.There are five parts in retrieval competition system, including feature extraction and matching to get candidates queue; database augmentation and query extension searching; reranking from recognition results and local feature matching. In recognition challenge including: landmark and non-landmark recognition, multiple recognition results voting and reranking using combination of recognition and retrieval results. All of models trained and predicted by PaddlePaddle framework. Using our method, we achieved 2nd place in the Google Landmark Recognition 2019 and 2nd place in the Google Landmark Retrieval 2019 on kaggle. The source code is available at here.
Tasks
Published 2019-06-10
URL https://arxiv.org/abs/1906.03990v2
PDF https://arxiv.org/pdf/1906.03990v2.pdf
PWC https://paperswithcode.com/paper/2nd-place-and-2nd-place-solution-to-kaggle
Repo
Framework

Long Range 3D with Quadocular Thermal (LWIR) Camera

Title Long Range 3D with Quadocular Thermal (LWIR) Camera
Authors Andrey Filippov, Oleg Dzhimiev
Abstract Long Wave Infrared (LWIR) cameras provide images regardles of the ambient illumination, they tolerate fog and are not blinded by the incoming car headlights. These features make LWIR cameras attractive for autonomous navigation, security and military applications. Thermal images can be used similarly to the visible range ones, including 3D scene reconstruction with two or more such cameras mounted on a rigid frame. There are two additional challenges for this spectral range: lower image resolution and lower contrast of the textures. In this work, we demonstrate quadocular LWIR camera setup, calibration, image capturing and processing that result in long range 3D perception with 0.077 pix disparity error over 90% of the depth map. With low resolution (160 x 120) LWIR sensors we achieved 10% range accuracy at 28 m with 56 degrees horizontal field of view (HFoV) and 150 mm baseline. Scaled to the now-standard 640 x 512 resolution and 200 mm baseline suitable for head-mounted application the result would be 10% accuracy at 130 m.
Tasks 3D Scene Reconstruction, Autonomous Navigation, Calibration
Published 2019-11-16
URL https://arxiv.org/abs/1911.06975v2
PDF https://arxiv.org/pdf/1911.06975v2.pdf
PWC https://paperswithcode.com/paper/long-range-3d-with-quadocular-thermal-lwir
Repo
Framework

Following Social Groups: Socially Compliant Autonomous Navigation in Dense Crowds

Title Following Social Groups: Socially Compliant Autonomous Navigation in Dense Crowds
Authors Xinjie Yao, Ji Zhang, Jean Oh
Abstract In densely populated environments, socially compliant navigation is critical for autonomous robots as driving close to people is unavoidable. This manner of social navigation is challenging given the constraints of human comfort and social rules. Traditional methods based on hand-craft cost functions to achieve this task have difficulties to operate in the complex real world. Other learning-based approaches fail to address the naturalness aspect from the perspective of collective formation behaviors. We present an autonomous navigation system capable of operating in dense crowds and utilizing information of social groups. The underlying system incorporates a deep neural network to track social groups and join the flow of a social group in facilitating the navigation. A collision avoidance layer in the system further ensures navigation safety. In experiments, our method generates socially compliant behaviors as state-of-the-art methods. More importantly, the system is capable of navigating safely in a densely populated area (10+ people in a 10m x 20m area) following crowd flows to reach the goal.
Tasks Autonomous Navigation
Published 2019-11-27
URL https://arxiv.org/abs/1911.12063v1
PDF https://arxiv.org/pdf/1911.12063v1.pdf
PWC https://paperswithcode.com/paper/following-social-groups-socially-compliant
Repo
Framework

Training a code-switching language model with monolingual data

Title Training a code-switching language model with monolingual data
Authors Shun-Po Chuang, Tzu-Wei Sung, Hung-Yi Lee
Abstract A lack of code-switching data complicates the training of code-switching (CS) language models. We propose an approach to train such CS language models on monolingual data only. By constraining and normalizing the output projection matrix in RNN-based language models, we bring embeddings of different languages closer to each other. Numerical and visualization results show that the proposed approaches remarkably improve the performance of CS language models trained on monolingual data. The proposed approaches are comparable or even better than training CS language models with artificially generated CS data. We additionally use unsupervised bilingual word translation to analyze whether semantically equivalent words in different languages are mapped together.
Tasks Language Modelling
Published 2019-11-14
URL https://arxiv.org/abs/1911.06003v1
PDF https://arxiv.org/pdf/1911.06003v1.pdf
PWC https://paperswithcode.com/paper/training-a-code-switching-language-model-with
Repo
Framework

Automatic Programming of Cellular Automata and Artificial Neural Networks Guided by Philosophy

Title Automatic Programming of Cellular Automata and Artificial Neural Networks Guided by Philosophy
Authors Patrik Christen, Olivier Del Fabbro
Abstract Many computer models such as cellular automata and artificial neural networks have been developed and successfully applied. However, in some cases, these models might be restrictive on the possible solutions or their solutions might be difficult to interpret. To overcome this problem, we outline a new approach, the so-called allagmatic method, that automatically programs and executes models with as little limitations as possible while maintaining human interpretability. Earlier we described a metamodel and its building blocks according to the philosophical concepts of structure (spatial dimension) and operation (temporal dimension). They are entity, milieu, and update function that together abstractly describe cellular automata, artificial neural networks, and possibly any kind of computer model. By automatically combining these building blocks in an evolutionary computation, interpretability might be increased by the relationship to the metamodel, and models might be translated into more interpretable models via the metamodel. We propose generic and object-oriented programming to implement the entities and their milieus as dynamic and generic arrays and the update function as a method. We show two experiments where a simple cellular automaton and an artificial neural network are automatically programmed, compiled, and executed. A target state is successfully evolved and learned in the cellular automaton and artificial neural network, respectively. We conclude that the allagmatic method can create and execute cellular automaton and artificial neural network models in an automated manner with the guidance of philosophy.
Tasks
Published 2019-05-10
URL https://arxiv.org/abs/1905.04232v4
PDF https://arxiv.org/pdf/1905.04232v4.pdf
PWC https://paperswithcode.com/paper/automatic-programming-of-cellular-automata
Repo
Framework

Landmark Assisted CycleGAN for Cartoon Face Generation

Title Landmark Assisted CycleGAN for Cartoon Face Generation
Authors Ruizheng Wu, Xiaodong Gu, Xin Tao, Xiaoyong Shen, Yu-Wing Tai, Jiaya Jia
Abstract In this paper, we are interested in generating an cartoon face of a person by using unpaired training data between real faces and cartoon ones. A major challenge of this task is that the structures of real and cartoon faces are in two different domains, whose appearance differs greatly from each other. Without explicit correspondence, it is difficult to generate a high quality cartoon face that captures the essential facial features of a person. In order to solve this problem, we propose landmark assisted CycleGAN, which utilizes face landmarks to define landmark consistency loss and to guide the training of local discriminator in CycleGAN. To enforce structural consistency in landmarks, we utilize the conditional generator and discriminator. Our approach is capable to generate high-quality cartoon faces even indistinguishable from those drawn by artists and largely improves state-of-the-art.
Tasks Face Generation
Published 2019-07-02
URL https://arxiv.org/abs/1907.01424v1
PDF https://arxiv.org/pdf/1907.01424v1.pdf
PWC https://paperswithcode.com/paper/landmark-assisted-cyclegan-for-cartoon-face
Repo
Framework

JRDB: A Dataset and Benchmark for Visual Perception for Navigation in Human Environments

Title JRDB: A Dataset and Benchmark for Visual Perception for Navigation in Human Environments
Authors Roberto Martín-Martín, Hamid Rezatofighi, Abhijeet Shenoi, Mihir Patel, JunYoung Gwak, Nathan Dass, Alan Federman, Patrick Goebel, Silvio Savarese
Abstract We present JRDB, a novel dataset collected from our social mobile manipulator JackRabbot. The dataset includes 64 minutes of multimodal sensor data including stereo cylindrical 360$^\circ$ RGB video at 15 fps, 3D point clouds from two Velodyne 16 Lidars, line 3D point clouds from two Sick Lidars, audio signal, RGBD video at 30 fps, 360$^\circ$ spherical image from a fisheye camera and encoder values from the robot’s wheels. Our dataset includes data from traditionally underrepresented scenes such as indoor environments and pedestrian areas, from both stationary and navigating robot platform. The dataset has been annotated with over 2.3 million bounding boxes spread over 5 individual cameras and 1.8 million associated 3D cuboids around all people in the scenes totalling over 3500 time consistent trajectories. Together with our dataset and the annotations, we launch a benchmark and metrics for 2D and 3D person detection and tracking. With this dataset, that we plan on further annotating in the future, we hope to provide a new source of data and a test-bench for research in the areas of robot autonomous navigation and all perceptual tasks around social robotics in human environments.
Tasks Autonomous Navigation, Human Detection
Published 2019-10-25
URL https://arxiv.org/abs/1910.11792v1
PDF https://arxiv.org/pdf/1910.11792v1.pdf
PWC https://paperswithcode.com/paper/jrdb-a-dataset-and-benchmark-for-visual
Repo
Framework

Automated Discovery and Classification of Training Videos for Career Progression

Title Automated Discovery and Classification of Training Videos for Career Progression
Authors Alan Chern, Phuong Hoang, Madhav Sigdel, Janani Balaji, Mohammed Korayem
Abstract Job transitions and upskilling are common actions taken by many industry working professionals throughout their career. With the current rapidly changing job landscape where requirements are constantly changing and industry sectors are emerging, it is especially difficult to plan and navigate a predetermined career path. In this work, we implemented a system to automate the collection and classification of training videos to help job seekers identify and acquire the skills necessary to transition to the next step in their career. We extracted educational videos and built a machine learning classifier to predict video relevancy. This system allows us to discover relevant videos at a large scale for job title-skill pairs. Our experiments show significant improvements in the model performance by incorporating embedding vectors associated with the video attributes. Additionally, we evaluated the optimal probability threshold to extract as many videos as possible with minimal false positive rate.
Tasks
Published 2019-07-23
URL https://arxiv.org/abs/1907.11086v1
PDF https://arxiv.org/pdf/1907.11086v1.pdf
PWC https://paperswithcode.com/paper/automated-discovery-and-classification-of
Repo
Framework

IPGuard: Protecting the Intellectual Property of Deep Neural Networks via Fingerprinting the Classification Boundary

Title IPGuard: Protecting the Intellectual Property of Deep Neural Networks via Fingerprinting the Classification Boundary
Authors Xiaoyu Cao, Jinyuan Jia, Neil Zhenqiang Gong
Abstract A deep neural network (DNN) classifier represents a model owner’s intellectual property as training a DNN classifier often requires lots of resource. Watermarking was recently proposed to protect the intellectual property of DNN classifiers. However, watermarking suffers from a key limitation: it sacrifices the utility/accuracy of the model owner’s classifier because it tampers the classifier’s training or fine-tuning process. In this work, we propose IPGuard, the first method to protect intellectual property of DNN classifiers that provably incurs no accuracy loss for the classifiers. Our key observation is that a DNN classifier can be uniquely represented by its classification boundary. Based on this observation, IPGuard extracts some data points near the classification boundary of the model owner’s classifier and uses them to fingerprint the classifier. A DNN classifier is said to be a pirated version of the model owner’s classifier if they predict the same labels for most fingerprinting data points. IPGuard is qualitatively different from watermarking. Specifically, IPGuard extracts fingerprinting data points near the classification boundary of a classifier that is already trained, while watermarking embeds watermarks into a classifier during its training or fine-tuning process. We extensively evaluate IPGuard on CIFAR-10, CIFAR-100, and ImageNet datasets. Our results show that IPGuard can robustly identify post-processed versions of the model owner’s classifier as pirated versions of the classifier, and IPGuard can identify classifiers, which are not the model owner’s classifier nor its post-processed versions, as non-pirated versions of the classifier.
Tasks
Published 2019-10-28
URL https://arxiv.org/abs/1910.12903v2
PDF https://arxiv.org/pdf/1910.12903v2.pdf
PWC https://paperswithcode.com/paper/ipguard-protecting-the-intellectual-property
Repo
Framework

FTGAN: A Fully-trained Generative Adversarial Networks for Text to Face Generation

Title FTGAN: A Fully-trained Generative Adversarial Networks for Text to Face Generation
Authors Xiang Chen, Lingbo Qing, Xiaohai He, Xiaodong Luo, Yining Xu
Abstract As a sub-domain of text-to-image synthesis, text-to-face generation has huge potentials in public safety domain. With lack of dataset, there are almost no related research focusing on text-to-face synthesis. In this paper, we propose a fully-trained Generative Adversarial Network (FTGAN) that trains the text encoder and image decoder at the same time for fine-grained text-to-face generation. With a novel fully-trained generative network, FTGAN can synthesize higher-quality images and urge the outputs of the FTGAN are more relevant to the input sentences. In addition, we build a dataset called SCU-Text2face for text-to-face synthesis. Through extensive experiments, the FTGAN shows its superiority in boosting both generated images’ quality and similarity to the input descriptions. The proposed FTGAN outperforms the previous state of the art, boosting the best reported Inception Score to 4.63 on the CUB dataset. On SCU-text2face, the face images generated by our proposed FTGAN just based on the input descriptions is of average 59% similarity to the ground-truth, which set a baseline for text-to-face synthesis.
Tasks Face Generation, Image Generation
Published 2019-04-11
URL http://arxiv.org/abs/1904.05729v1
PDF http://arxiv.org/pdf/1904.05729v1.pdf
PWC https://paperswithcode.com/paper/ftgan-a-fully-trained-generative-adversarial
Repo
Framework

Video Segment Copy Detection Using Memory Constrained Hierarchical Batch-Normalized LSTM Autoencoder

Title Video Segment Copy Detection Using Memory Constrained Hierarchical Batch-Normalized LSTM Autoencoder
Authors Arjun Krishna, A S Akil Arif Ibrahim
Abstract In this report, we introduce a video hashing method for scalable video segment copy detection. The objective of video segment copy detection is to find the video (s) present in a large database, one of whose segments (cropped in time) is a (transformed) copy of the given query video. This transformation may be temporal (for example frame dropping, change in frame rate) or spatial (brightness and contrast change, addition of noise etc.) in nature although the primary focus of this report is detecting temporal attacks. The video hashing method proposed by us uses a deep learning neural network to learn variable length binary hash codes for the entire video considering both temporal and spatial features into account. This is in contrast to most existing video hashing methods, as they use conventional image hashing techniques to obtain hash codes for a video after extracting features for every frame or certain key frames, in which case the temporal information present in the video is not exploited. Our hashing method is specifically resilient to time cropping making it extremely useful in video segment copy detection. Experimental results obtained on the large augmented dataset consisting of around 25,000 videos with segment copies demonstrate the efficacy of our proposed video hashing method.
Tasks
Published 2019-11-20
URL https://arxiv.org/abs/1911.09518v1
PDF https://arxiv.org/pdf/1911.09518v1.pdf
PWC https://paperswithcode.com/paper/video-segment-copy-detection-using-memory
Repo
Framework
comments powered by Disqus