July 29, 2019

2977 words 14 mins read

Paper Group AWR 195

Paper Group AWR 195

Learning to Fuse Music Genres with Generative Adversarial Dual Learning. Skin Lesion Analysis Toward Melanoma Detection: A Challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), Hosted by the International Skin Imaging Collaboration (ISIC). GPU-acceleration for Large-scale Tree Boosting. Re3 : Real-Time Recurrent Regression Net …

Learning to Fuse Music Genres with Generative Adversarial Dual Learning

Title Learning to Fuse Music Genres with Generative Adversarial Dual Learning
Authors Zhiqian Chen, Chih-Wei Wu, Yen-Cheng Lu, Alexander Lerch, Chang-Tien Lu
Abstract FusionGAN is a novel genre fusion framework for music generation that integrates the strengths of generative adversarial networks and dual learning. In particular, the proposed method offers a dual learning extension that can effectively integrate the styles of the given domains. To efficiently quantify the difference among diverse domains and avoid the vanishing gradient issue, FusionGAN provides a Wasserstein based metric to approximate the distance between the target domain and the existing domains. Adopting the Wasserstein distance, a new domain is created by combining the patterns of the existing domains using adversarial learning. Experimental results on public music datasets demonstrated that our approach could effectively merge two genres.
Tasks Music Generation
Published 2017-12-05
URL http://arxiv.org/abs/1712.01456v1
PDF http://arxiv.org/pdf/1712.01456v1.pdf
PWC https://paperswithcode.com/paper/learning-to-fuse-music-genres-with-generative
Repo https://github.com/aquastar/fusion_gan
Framework tf

Skin Lesion Analysis Toward Melanoma Detection: A Challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), Hosted by the International Skin Imaging Collaboration (ISIC)

Title Skin Lesion Analysis Toward Melanoma Detection: A Challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), Hosted by the International Skin Imaging Collaboration (ISIC)
Authors Noel C. F. Codella, David Gutman, M. Emre Celebi, Brian Helba, Michael A. Marchetti, Stephen W. Dusza, Aadi Kalloo, Konstantinos Liopyris, Nabin Mishra, Harald Kittler, Allan Halpern
Abstract This article describes the design, implementation, and results of the latest installment of the dermoscopic image analysis benchmark challenge. The goal is to support research and development of algorithms for automated diagnosis of melanoma, the most lethal skin cancer. The challenge was divided into 3 tasks: lesion segmentation, feature detection, and disease classification. Participation involved 593 registrations, 81 pre-submissions, 46 finalized submissions (including a 4-page manuscript), and approximately 50 attendees, making this the largest standardized and comparative study in this field to date. While the official challenge duration and ranking of participants has concluded, the dataset snapshots remain available for further research and development.
Tasks Lesion Segmentation
Published 2017-10-13
URL http://arxiv.org/abs/1710.05006v3
PDF http://arxiv.org/pdf/1710.05006v3.pdf
PWC https://paperswithcode.com/paper/skin-lesion-analysis-toward-melanoma
Repo https://github.com/dropoutlabs/encrypted-skin-cancer-detection
Framework tf

GPU-acceleration for Large-scale Tree Boosting

Title GPU-acceleration for Large-scale Tree Boosting
Authors Huan Zhang, Si Si, Cho-Jui Hsieh
Abstract In this paper, we present a novel massively parallel algorithm for accelerating the decision tree building procedure on GPUs (Graphics Processing Units), which is a crucial step in Gradient Boosted Decision Tree (GBDT) and random forests training. Previous GPU based tree building algorithms are based on parallel multi-scan or radix sort to find the exact tree split, and thus suffer from scalability and performance issues. We show that using a histogram based algorithm to approximately find the best split is more efficient and scalable on GPU. By identifying the difference between classical GPU-based image histogram construction and the feature histogram construction in decision tree training, we develop a fast feature histogram building kernel on GPU with carefully designed computational and memory access sequence to reduce atomic update conflict and maximize GPU utilization. Our algorithm can be used as a drop-in replacement for histogram construction in popular tree boosting systems to improve their scalability. As an example, to train GBDT on epsilon dataset, our method using a main-stream GPU is 7-8 times faster than histogram based algorithm on CPU in LightGBM and 25 times faster than the exact-split finding algorithm in XGBoost on a dual-socket 28-core Xeon server, while achieving similar prediction accuracy.
Tasks
Published 2017-06-26
URL http://arxiv.org/abs/1706.08359v1
PDF http://arxiv.org/pdf/1706.08359v1.pdf
PWC https://paperswithcode.com/paper/gpu-acceleration-for-large-scale-tree
Repo https://github.com/ibr11/LightGBM
Framework none

Re3 : Real-Time Recurrent Regression Networks for Visual Tracking of Generic Objects

Title Re3 : Real-Time Recurrent Regression Networks for Visual Tracking of Generic Objects
Authors Daniel Gordon, Ali Farhadi, Dieter Fox
Abstract Robust object tracking requires knowledge and understanding of the object being tracked: its appearance, its motion, and how it changes over time. A tracker must be able to modify its underlying model and adapt to new observations. We present Re3, a real-time deep object tracker capable of incorporating temporal information into its model. Rather than focusing on a limited set of objects or training a model at test-time to track a specific instance, we pretrain our generic tracker on a large variety of objects and efficiently update on the fly; Re3 simultaneously tracks and updates the appearance model with a single forward pass. This lightweight model is capable of tracking objects at 150 FPS, while attaining competitive results on challenging benchmarks. We also show that our method handles temporary occlusion better than other comparable trackers using experiments that directly measure performance on sequences with occlusion.
Tasks Object Tracking, Visual Tracking
Published 2017-05-17
URL http://arxiv.org/abs/1705.06368v3
PDF http://arxiv.org/pdf/1705.06368v3.pdf
PWC https://paperswithcode.com/paper/re3-real-time-recurrent-regression-networks
Repo https://github.com/natdebru/OpenCV-Video-Label
Framework tf

Diagonal Rescaling For Neural Networks

Title Diagonal Rescaling For Neural Networks
Authors Jean Lafond, Nicolas Vasilache, Léon Bottou
Abstract We define a second-order neural network stochastic gradient training algorithm whose block-diagonal structure effectively amounts to normalizing the unit activations. Investigating why this algorithm lacks in robustness then reveals two interesting insights. The first insight suggests a new way to scale the stepsizes, clarifying popular algorithms such as RMSProp as well as old neural network tricks such as fanin stepsize scaling. The second insight stresses the practical importance of dealing with fast changes of the curvature of the cost.
Tasks
Published 2017-05-25
URL http://arxiv.org/abs/1705.09319v1
PDF http://arxiv.org/pdf/1705.09319v1.pdf
PWC https://paperswithcode.com/paper/diagonal-rescaling-for-neural-networks
Repo https://github.com/Thrandis/EKFAC-pytorch
Framework pytorch

Hybrid Oracle: Making Use of Ambiguity in Transition-based Chinese Dependency Parsing

Title Hybrid Oracle: Making Use of Ambiguity in Transition-based Chinese Dependency Parsing
Authors Xuancheng Ren, Xu Sun
Abstract In the training of transition-based dependency parsers, an oracle is used to predict a transition sequence for a sentence and its gold tree. However, the transition system may exhibit ambiguity, that is, there can be multiple correct transition sequences that form the gold tree. We propose to make use of the property in the training of neural dependency parsers, and present the Hybrid Oracle. The new oracle gives all the correct transitions for a parsing state, which are used in the cross entropy loss function to provide better supervisory signal. It is also used to generate different transition sequences for a sentence to better explore the training data and improve the generalization ability of the parser. Evaluations show that the parsers trained using the hybrid oracle outperform the parsers using the traditional oracle in Chinese dependency parsing. We provide analysis from a linguistic view. The code is available at https://github.com/lancopku/nndep .
Tasks Dependency Parsing
Published 2017-11-28
URL http://arxiv.org/abs/1711.10163v2
PDF http://arxiv.org/pdf/1711.10163v2.pdf
PWC https://paperswithcode.com/paper/hybrid-oracle-making-use-of-ambiguity-in
Repo https://github.com/lancopku/nndep
Framework none

Colorization as a Proxy Task for Visual Understanding

Title Colorization as a Proxy Task for Visual Understanding
Authors Gustav Larsson, Michael Maire, Gregory Shakhnarovich
Abstract We investigate and improve self-supervision as a drop-in replacement for ImageNet pretraining, focusing on automatic colorization as the proxy task. Self-supervised training has been shown to be more promising for utilizing unlabeled data than other, traditional unsupervised learning methods. We build on this success and evaluate the ability of our self-supervised network in several contexts. On VOC segmentation and classification tasks, we present results that are state-of-the-art among methods not using ImageNet labels for pretraining representations. Moreover, we present the first in-depth analysis of self-supervision via colorization, concluding that formulation of the loss, training details and network architecture play important roles in its effectiveness. This investigation is further expanded by revisiting the ImageNet pretraining paradigm, asking questions such as: How much training data is needed? How many labels are needed? How much do features change when fine-tuned? We relate these questions back to self-supervision by showing that colorization provides a similarly powerful supervisory signal as various flavors of ImageNet pretraining.
Tasks Colorization
Published 2017-03-11
URL http://arxiv.org/abs/1703.04044v3
PDF http://arxiv.org/pdf/1703.04044v3.pdf
PWC https://paperswithcode.com/paper/colorization-as-a-proxy-task-for-visual
Repo https://github.com/gustavla/self-supervision
Framework tf

Learning from Video and Text via Large-Scale Discriminative Clustering

Title Learning from Video and Text via Large-Scale Discriminative Clustering
Authors Antoine Miech, Jean-Baptiste Alayrac, Piotr Bojanowski, Ivan Laptev, Josef Sivic
Abstract Discriminative clustering has been successfully applied to a number of weakly-supervised learning tasks. Such applications include person and action recognition, text-to-video alignment, object co-segmentation and colocalization in videos and images. One drawback of discriminative clustering, however, is its limited scalability. We address this issue and propose an online optimization algorithm based on the Block-Coordinate Frank-Wolfe algorithm. We apply the proposed method to the problem of weakly supervised learning of actions and actors from movies together with corresponding movie scripts. The scaling up of the learning problem to 66 feature length movies enables us to significantly improve weakly supervised action recognition.
Tasks Temporal Action Localization, Video Alignment, Video Retrieval
Published 2017-07-27
URL http://arxiv.org/abs/1707.09074v1
PDF http://arxiv.org/pdf/1707.09074v1.pdf
PWC https://paperswithcode.com/paper/learning-from-video-and-text-via-large-scale
Repo https://github.com/antoine77340/iccv17learning
Framework none

Logical Learning Through a Hybrid Neural Network with Auxiliary Inputs

Title Logical Learning Through a Hybrid Neural Network with Auxiliary Inputs
Authors Fang Wan, Chaoyang Song
Abstract The human reasoning process is seldom a one-way process from an input leading to an output. Instead, it often involves a systematic deduction by ruling out other possible outcomes as a self-checking mechanism. In this paper, we describe the design of a hybrid neural network for logical learning that is similar to the human reasoning through the introduction of an auxiliary input, namely the indicators, that act as the hints to suggest logical outcomes. We generate these indicators by digging into the hidden information buried underneath the original training data for direct or indirect suggestions. We used the MNIST data to demonstrate the design and use of these indicators in a convolutional neural network. We trained a series of such hybrid neural networks with variations of the indicators. Our results show that these hybrid neural networks are very robust in generating logical outcomes with inherently higher prediction accuracy than the direct use of the original input and output in apparent models. Such improved predictability with reassured logical confidence is obtained through the exhaustion of all possible indicators to rule out all illogical outcomes, which is not available in the apparent models. Our logical learning process can effectively cope with the unknown unknowns using a full exploitation of all existing knowledge available for learning. The design and implementation of the hints, namely the indicators, become an essential part of artificial intelligence for logical learning. We also introduce an ongoing application setup for this hybrid neural network in an autonomous grasping robot, namely as_DeepClaw, aiming at learning an optimized grasping pose through logical learning.
Tasks
Published 2017-05-23
URL http://arxiv.org/abs/1705.08200v1
PDF http://arxiv.org/pdf/1705.08200v1.pdf
PWC https://paperswithcode.com/paper/logical-learning-through-a-hybrid-neural
Repo https://github.com/as-wanfang/as_HybridNN
Framework tf

Deep Nonparametric Estimation of Discrete Conditional Distributions via Smoothed Dyadic Partitioning

Title Deep Nonparametric Estimation of Discrete Conditional Distributions via Smoothed Dyadic Partitioning
Authors Wesley Tansey, Karl Pichotta, James G. Scott
Abstract We present an approach to deep estimation of discrete conditional probability distributions. Such models have several applications, including generative modeling of audio, image, and video data. Our approach combines two main techniques: dyadic partitioning and graph-based smoothing of the discrete space. By recursively decomposing each dimension into a series of binary splits and smoothing over the resulting distribution using graph-based trend filtering, we impose a strict structure to the model and achieve much higher sample efficiency. We demonstrate the advantages of our model through a series of benchmarks on both synthetic and real-world datasets, in some cases reducing the error by nearly half in comparison to other popular methods in the literature. All of our models are implemented in Tensorflow and publicly available at https://github.com/tansey/sdp .
Tasks
Published 2017-02-23
URL http://arxiv.org/abs/1702.07398v2
PDF http://arxiv.org/pdf/1702.07398v2.pdf
PWC https://paperswithcode.com/paper/deep-nonparametric-estimation-of-discrete
Repo https://github.com/tansey/sdp
Framework tf

Context-based Normalization of Histological Stains using Deep Convolutional Features

Title Context-based Normalization of Histological Stains using Deep Convolutional Features
Authors Daniel Bug, Steffen Schneider, Anne Grote, Eva Oswald, Friedrich Feuerhake, Julia Schüler, Dorit Merhof
Abstract While human observers are able to cope with variations in color and appearance of histological stains, digital pathology algorithms commonly require a well-normalized setting to achieve peak performance, especially when a limited amount of labeled data is available. This work provides a fully automated, end-to-end learning-based setup for normalizing histological stains, which considers the texture context of the tissue. We introduce Feature Aware Normalization, which extends the framework of batch normalization in combination with gating elements from Long Short-Term Memory units for normalization among different spatial regions of interest. By incorporating a pretrained deep neural network as a feature extractor steering a pixelwise processing pipeline, we achieve excellent normalization results and ensure a consistent representation of color and texture. The evaluation comprises a comparison of color histogram deviations, structural similarity and measures the color volume obtained by the different methods.
Tasks
Published 2017-08-14
URL http://arxiv.org/abs/1708.04099v1
PDF http://arxiv.org/pdf/1708.04099v1.pdf
PWC https://paperswithcode.com/paper/context-based-normalization-of-histological
Repo https://github.com/stes/fan
Framework pytorch

Towards Building an Intelligent Anti-Malware System: A Deep Learning Approach using Support Vector Machine (SVM) for Malware Classification

Title Towards Building an Intelligent Anti-Malware System: A Deep Learning Approach using Support Vector Machine (SVM) for Malware Classification
Authors Abien Fred Agarap
Abstract Effective and efficient mitigation of malware is a long-time endeavor in the information security community. The development of an anti-malware system that can counteract an unknown malware is a prolific activity that may benefit several sectors. We envision an intelligent anti-malware system that utilizes the power of deep learning (DL) models. Using such models would enable the detection of newly-released malware through mathematical generalization. That is, finding the relationship between a given malware $x$ and its corresponding malware family $y$, $f: x \mapsto y$. To accomplish this feat, we used the Malimg dataset (Nataraj et al., 2011) which consists of malware images that were processed from malware binaries, and then we trained the following DL models 1 to classify each malware family: CNN-SVM (Tang, 2013), GRU-SVM (Agarap, 2017), and MLP-SVM. Empirical evidence has shown that the GRU-SVM stands out among the DL models with a predictive accuracy of ~84.92%. This stands to reason for the mentioned model had the relatively most sophisticated architecture design among the presented models. The exploration of an even more optimal DL-SVM model is the next stage towards the engineering of an intelligent anti-malware system.
Tasks Malware Classification
Published 2017-12-31
URL http://arxiv.org/abs/1801.00318v2
PDF http://arxiv.org/pdf/1801.00318v2.pdf
PWC https://paperswithcode.com/paper/towards-building-an-intelligent-anti-malware
Repo https://github.com/AFAgarap/malware-classification
Framework tf

What Actions are Needed for Understanding Human Actions in Videos?

Title What Actions are Needed for Understanding Human Actions in Videos?
Authors Gunnar A. Sigurdsson, Olga Russakovsky, Abhinav Gupta
Abstract What is the right way to reason about human activities? What directions forward are most promising? In this work, we analyze the current state of human activity understanding in videos. The goal of this paper is to examine datasets, evaluation metrics, algorithms, and potential future directions. We look at the qualitative attributes that define activities such as pose variability, brevity, and density. The experiments consider multiple state-of-the-art algorithms and multiple datasets. The results demonstrate that while there is inherent ambiguity in the temporal extent of activities, current datasets still permit effective benchmarking. We discover that fine-grained understanding of objects and pose when combined with temporal reasoning is likely to yield substantial improvements in algorithmic accuracy. We present the many kinds of information that will be needed to achieve substantial gains in activity understanding: objects, verbs, intent, and sequential reasoning. The software and additional information will be made available to provide other researchers detailed diagnostics to understand their own algorithms.
Tasks
Published 2017-08-09
URL http://arxiv.org/abs/1708.02696v1
PDF http://arxiv.org/pdf/1708.02696v1.pdf
PWC https://paperswithcode.com/paper/what-actions-are-needed-for-understanding
Repo https://github.com/gsig/actions-for-actions
Framework none

Mobile Video Object Detection with Temporally-Aware Feature Maps

Title Mobile Video Object Detection with Temporally-Aware Feature Maps
Authors Mason Liu, Menglong Zhu
Abstract This paper introduces an online model for object detection in videos designed to run in real-time on low-powered mobile and embedded devices. Our approach combines fast single-image object detection with convolutional long short term memory (LSTM) layers to create an interweaved recurrent-convolutional architecture. Additionally, we propose an efficient Bottleneck-LSTM layer that significantly reduces computational cost compared to regular LSTMs. Our network achieves temporal awareness by using Bottleneck-LSTMs to refine and propagate feature maps across frames. This approach is substantially faster than existing detection methods in video, outperforming the fastest single-frame models in model size and computational cost while attaining accuracy comparable to much more expensive single-frame models on the Imagenet VID 2015 dataset. Our model reaches a real-time inference speed of up to 15 FPS on a mobile CPU.
Tasks Object Detection, Video Object Detection
Published 2017-11-17
URL http://arxiv.org/abs/1711.06368v2
PDF http://arxiv.org/pdf/1711.06368v2.pdf
PWC https://paperswithcode.com/paper/mobile-video-object-detection-with-temporally
Repo https://github.com/vikrant7/mobile-vod-bottleneck-lstm
Framework pytorch

LiDAR-Camera Calibration using 3D-3D Point correspondences

Title LiDAR-Camera Calibration using 3D-3D Point correspondences
Authors Ankit Dhall, Kunal Chelani, Vishnu Radhakrishnan, K. M. Krishna
Abstract With the advent of autonomous vehicles, LiDAR and cameras have become an indispensable combination of sensors. They both provide rich and complementary data which can be used by various algorithms and machine learning to sense and make vital inferences about the surroundings. We propose a novel pipeline and experimental setup to find accurate rigid-body transformation for extrinsically calibrating a LiDAR and a camera. The pipeling uses 3D-3D point correspondences in LiDAR and camera frame and gives a closed form solution. We further show the accuracy of the estimate by fusing point clouds from two stereo cameras which align perfectly with the rotation and translation estimated by our method, confirming the accuracy of our method’s estimates both mathematically and visually. Taking our idea of extrinsic LiDAR-camera calibration forward, we demonstrate how two cameras with no overlapping field-of-view can also be calibrated extrinsically using 3D point correspondences. The code has been made available as open-source software in the form of a ROS package, more information about which can be sought here: https://github.com/ankitdhall/lidar_camera_calibration .
Tasks Autonomous Vehicles, Calibration
Published 2017-05-27
URL http://arxiv.org/abs/1705.09785v1
PDF http://arxiv.org/pdf/1705.09785v1.pdf
PWC https://paperswithcode.com/paper/lidar-camera-calibration-using-3d-3d-point
Repo https://github.com/agarwa65/lidar_camera_calibration
Framework tf
comments powered by Disqus