February 1, 2020

3151 words 15 mins read

Paper Group AWR 259

Paper Group AWR 259

A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning. DeepTemporalSeg: Temporally Consistent Semantic Segmentation of 3D LiDAR Scans. DenseBody: Directly Regressing Dense 3D Human Pose and Shape From a Single Color Image. Deep Neural Models for Medical Concept Normalization in User-Generated Texts. Temporal Atte …

A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning

Title A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning
Authors Tal Ben-Nun, Maciej Besta, Simon Huber, Alexandros Nikolaos Ziogas, Daniel Peter, Torsten Hoefler
Abstract We introduce Deep500: the first customizable benchmarking infrastructure that enables fair comparison of the plethora of deep learning frameworks, algorithms, libraries, and techniques. The key idea behind Deep500 is its modular design, where deep learning is factorized into four distinct levels: operators, network processing, training, and distributed training. Our evaluation illustrates that Deep500 is customizable (enables combining and benchmarking different deep learning codes) and fair (uses carefully selected metrics). Moreover, Deep500 is fast (incurs negligible overheads), verifiable (offers infrastructure to analyze correctness), and reproducible. Finally, as the first distributed and reproducible benchmarking system for deep learning, Deep500 provides software infrastructure to utilize the most powerful supercomputers for extreme-scale workloads.
Tasks
Published 2019-01-29
URL https://arxiv.org/abs/1901.10183v2
PDF https://arxiv.org/pdf/1901.10183v2.pdf
PWC https://paperswithcode.com/paper/a-modular-benchmarking-infrastructure-for
Repo https://github.com/deep500/deep500
Framework tf

DeepTemporalSeg: Temporally Consistent Semantic Segmentation of 3D LiDAR Scans

Title DeepTemporalSeg: Temporally Consistent Semantic Segmentation of 3D LiDAR Scans
Authors Ayush Dewan, Wolfram Burgard
Abstract Understanding the semantic characteristics of the environment is a key enabler for autonomous robot operation. In this paper, we propose a deep convolutional neural network (DCNN) for the semantic segmentation of a LiDAR scan into the classes car, pedestrian or bicyclist. This architecture is based on dense blocks and efficiently utilizes depth separable convolutions to limit the number of parameters while still maintaining state-of-the-art performance. To make the predictions from the DCNN temporally consistent, we propose a Bayes filter based method. This method uses the predictions from the neural network to recursively estimate the current semantic state of a point in a scan. This recursive estimation uses the knowledge gained from previous scans, thereby making the predictions temporally consistent and robust towards isolated erroneous predictions. We compare the performance of our proposed architecture with other state-of-the-art neural network architectures and report substantial improvement. For the proposed Bayes filter approach, we show results on various sequences in the KITTI tracking benchmark.
Tasks Semantic Segmentation
Published 2019-06-17
URL https://arxiv.org/abs/1906.06962v3
PDF https://arxiv.org/pdf/1906.06962v3.pdf
PWC https://paperswithcode.com/paper/deeptemporalseg-temporally-consistent
Repo https://github.com/ayushais/DBLiDARNet
Framework tf

DenseBody: Directly Regressing Dense 3D Human Pose and Shape From a Single Color Image

Title DenseBody: Directly Regressing Dense 3D Human Pose and Shape From a Single Color Image
Authors Pengfei Yao, Zheng Fang, Fan Wu, Yao Feng, Jiwei Li
Abstract Recovering 3D human body shape and pose from 2D images is a challenging task due to high complexity and flexibility of human body, and relatively less 3D labeled data. Previous methods addressing these issues typically rely on predicting intermediate results such as body part segmentation, 2D/3D joints, silhouette mask to decompose the problem into multiple sub-tasks in order to utilize more 2D labels. Most previous works incorporated parametric body shape model in their methods and predict parameters in low-dimensional space to represent human body. In this paper, we propose to directly regress the 3D human mesh from a single color image using Convolutional Neural Network(CNN). We use an efficient representation of 3D human shape and pose which can be predicted through an encoder-decoder neural network. The proposed method achieves state-of-the-art performance on several 3D human body datasets including Human3.6M, SURREAL and UP-3D with even faster running speed.
Tasks
Published 2019-03-25
URL http://arxiv.org/abs/1903.10153v3
PDF http://arxiv.org/pdf/1903.10153v3.pdf
PWC https://paperswithcode.com/paper/densebody-directly-regressing-dense-3d-human
Repo https://github.com/yongyct/densebody-poc
Framework pytorch

Deep Neural Models for Medical Concept Normalization in User-Generated Texts

Title Deep Neural Models for Medical Concept Normalization in User-Generated Texts
Authors Zulfat Miftahutdinov, Elena Tutubalina
Abstract In this work, we consider the medical concept normalization problem, i.e., the problem of mapping a health-related entity mention in a free-form text to a concept in a controlled vocabulary, usually to the standard thesaurus in the Unified Medical Language System (UMLS). This is a challenging task since medical terminology is very different when coming from health care professionals or from the general public in the form of social media texts. We approach it as a sequence learning problem with powerful neural networks such as recurrent neural networks and contextualized word representation models trained to obtain semantic representations of social media expressions. Our experimental evaluation over three different benchmarks shows that neural architectures leverage the semantic meaning of the entity mention and significantly outperform an existing state of the art models.
Tasks Medical Concept Normalization
Published 2019-07-18
URL https://arxiv.org/abs/1907.07972v1
PDF https://arxiv.org/pdf/1907.07972v1.pdf
PWC https://paperswithcode.com/paper/deep-neural-models-for-medical-concept
Repo https://github.com/dartrevan/medical_concept_normalization
Framework none

Temporal Attentive Alignment for Large-Scale Video Domain Adaptation

Title Temporal Attentive Alignment for Large-Scale Video Domain Adaptation
Authors Min-Hung Chen, Zsolt Kira, Ghassan AlRegib, Jaekwon Yoo, Ruxin Chen, Jian Zheng
Abstract Although various image-based domain adaptation (DA) techniques have been proposed in recent years, domain shift in videos is still not well-explored. Most previous works only evaluate performance on small-scale datasets which are saturated. Therefore, we first propose two large-scale video DA datasets with much larger domain discrepancy: UCF-HMDB_full and Kinetics-Gameplay. Second, we investigate different DA integration methods for videos, and show that simultaneously aligning and learning temporal dynamics achieves effective alignment even without sophisticated DA methods. Finally, we propose Temporal Attentive Adversarial Adaptation Network (TA3N), which explicitly attends to the temporal dynamics using domain discrepancy for more effective domain alignment, achieving state-of-the-art performance on four video DA datasets (e.g. 7.9% accuracy gain over “Source only” from 73.9% to 81.8% on “HMDB –> UCF”, and 10.3% gain on “Kinetics –> Gameplay”). The code and data are released at http://github.com/cmhungsteve/TA3N.
Tasks Domain Adaptation
Published 2019-07-30
URL https://arxiv.org/abs/1907.12743v6
PDF https://arxiv.org/pdf/1907.12743v6.pdf
PWC https://paperswithcode.com/paper/temporal-attentive-alignment-for-large-scale
Repo https://github.com/olivesgatech/TA3N
Framework pytorch

Fine-Grained Argument Unit Recognition and Classification

Title Fine-Grained Argument Unit Recognition and Classification
Authors Dietrich Trautmann, Johannes Daxenberger, Christian Stab, Hinrich Schütze, Iryna Gurevych
Abstract Prior work has commonly defined argument retrieval from heterogeneous document collections as a sentence-level classification task. Consequently, argument retrieval suffers both from low recall and from sentence segmentation errors making it difficult for humans and machines to consume the arguments. In this work, we argue that the task should be performed on a more fine-grained level of sequence labeling. For this, we define the task as Argument Unit Recognition and Classification (AURC). We present a dataset of arguments from heterogeneous sources annotated as spans of tokens within a sentence, as well as with a corresponding stance. We show that and how such difficult argument annotations can be effectively collected through crowdsourcing with high interannotator agreement. The new benchmark, AURC-8, contains up to 15% more arguments per topic as compared to annotations on the sentence level. We identify a number of methods targeted at AURC sequence labeling, achieving close to human performance on known domains. Further analysis also reveals that, contrary to previous approaches, our methods are more robust against sentence segmentation errors. We publicly release our code and the AURC-8 dataset.
Tasks Argument Mining
Published 2019-04-22
URL https://arxiv.org/abs/1904.09688v4
PDF https://arxiv.org/pdf/1904.09688v4.pdf
PWC https://paperswithcode.com/paper/robust-argument-unit-recognition-and
Repo https://github.com/trtm/AURC
Framework none

Automated Deep Photo Style Transfer

Title Automated Deep Photo Style Transfer
Authors Sebastian Penhouët, Paul Sanzenbacher
Abstract Photorealism is a complex concept that cannot easily be formulated mathematically. Deep Photo Style Transfer is an attempt to transfer the style of a reference image to a content image while preserving its photorealism. This is achieved by introducing a constraint that prevents distortions in the content image and by applying the style transfer independently for semantically different parts of the images. In addition, an automated segmentation process is presented that consists of a neural network based segmentation method followed by a semantic grouping step. To further improve the results a measure for image aesthetics is used and elaborated. If the content and the style image are sufficiently similar, the result images look very realistic. With the automation of the image segmentation the pipeline becomes completely independent from any user interaction, which allows for new applications.
Tasks Semantic Segmentation, Style Transfer
Published 2019-01-12
URL http://arxiv.org/abs/1901.03915v1
PDF http://arxiv.org/pdf/1901.03915v1.pdf
PWC https://paperswithcode.com/paper/automated-deep-photo-style-transfer
Repo https://github.com/ryant18/StyleTransfer
Framework tf

Temporal coding in spiking neural networks with alpha synaptic function

Title Temporal coding in spiking neural networks with alpha synaptic function
Authors Iulia M. Comsa, Krzysztof Potempa, Luca Versari, Thomas Fischbacher, Andrea Gesmundo, Jyrki Alakuijala
Abstract The timing of individual neuronal spikes is essential for biological brains to make fast responses to sensory stimuli. However, conventional artificial neural networks lack the intrinsic temporal coding ability present in biological networks. We propose a spiking neural network model that encodes information in the relative timing of individual neuron spikes. In classification tasks, the output of the network is indicated by the first neuron to spike in the output layer. This temporal coding scheme allows the supervised training of the network with backpropagation, using locally exact derivatives of the postsynaptic spike times with respect to presynaptic spike times. The network operates using a biologically-plausible alpha synaptic transfer function. Additionally, we use trainable synchronisation pulses that provide bias, add flexibility during training and exploit the decay part of the alpha function. We show that such networks can be trained successfully on noisy Boolean logic tasks and on the MNIST dataset encoded in time. The results show that the spiking neural network outperforms comparable spiking models on MNIST and achieves similar quality to fully connected conventional networks with the same architecture. We also find that the spiking network spontaneously discovers two operating regimes, mirroring the accuracy-speed trade-off observed in human decision-making: a slow regime, where a decision is taken after all hidden neurons have spiked and the accuracy is very high, and a fast regime, where a decision is taken very fast but the accuracy is lower. These results demonstrate the computational power of spiking networks with biological characteristics that encode information in the timing of individual neurons. By studying temporal coding in spiking networks, we aim to create building blocks towards energy-efficient and more complex biologically-inspired neural architectures.
Tasks Decision Making
Published 2019-07-30
URL https://arxiv.org/abs/1907.13223v2
PDF https://arxiv.org/pdf/1907.13223v2.pdf
PWC https://paperswithcode.com/paper/temporal-coding-in-spiking-neural-networks
Repo https://github.com/snisher/Spiking-Neural-Network
Framework none

Surrogate Gradient Learning in Spiking Neural Networks

Title Surrogate Gradient Learning in Spiking Neural Networks
Authors Emre O. Neftci, Hesham Mostafa, Friedemann Zenke
Abstract Spiking neural networks are nature’s versatile solution to fault-tolerant and energy efficient signal processing. To translate these benefits into hardware, a growing number of neuromorphic spiking neural network processors attempt to emulate biological neural networks. These developments have created an imminent need for methods and tools to enable such systems to solve real-world signal processing problems. Like conventional neural networks, spiking neural networks can be trained on real, domain specific data. However, their training requires overcoming a number of challenges linked to their binary and dynamical nature. This article elucidates step-by-step the problems typically encountered when training spiking neural networks, and guides the reader through the key concepts of synaptic plasticity and data-driven learning in the spiking setting. To that end, it gives an overview of existing approaches and provides an introduction to surrogate gradient methods, specifically, as a particularly flexible and efficient method to overcome the aforementioned challenges.
Tasks
Published 2019-01-28
URL https://arxiv.org/abs/1901.09948v2
PDF https://arxiv.org/pdf/1901.09948v2.pdf
PWC https://paperswithcode.com/paper/surrogate-gradient-learning-in-spiking-neural
Repo https://github.com/surrogate-gradient-learning/spytorch
Framework pytorch

Performing Structured Improvisations with pre-trained Deep Learning Models

Title Performing Structured Improvisations with pre-trained Deep Learning Models
Authors Pablo Samuel Castro
Abstract The quality of outputs produced by deep generative models for music have seen a dramatic improvement in the last few years. However, most deep learning models perform in “offline” mode, with few restrictions on the processing time. Integrating these types of models into a live structured performance poses a challenge because of the necessity to respect the beat and harmony. Further, these deep models tend to be agnostic to the style of a performer, which often renders them impractical for live performance. In this paper we propose a system which enables the integration of out-of-the-box generative models by leveraging the musician’s creativity and expertise.
Tasks
Published 2019-04-30
URL http://arxiv.org/abs/1904.13285v1
PDF http://arxiv.org/pdf/1904.13285v1.pdf
PWC https://paperswithcode.com/paper/performing-structured-improvisations-with-pre
Repo https://github.com/psc-g/Psc2
Framework tf

HighEr-Resolution Network for Image Demosaicing and Enhancing

Title HighEr-Resolution Network for Image Demosaicing and Enhancing
Authors Kangfu Mei, Juncheng Li, Jiajie Zhang, Haoyu Wu, Jie Li, Rui Huang
Abstract Neural-networks based image restoration methods tend to use low-resolution image patches for training. Although higher-resolution image patches can provide more global information, state-of-the-art methods cannot utilize them due to their huge GPU memory usage, as well as the instable training process. However, plenty of studies have shown that global information is crucial for image restoration tasks like image demosaicing and enhancing. In this work, we propose a HighEr-Resolution Network (HERN) to fully learning global information in high-resolution image patches. To achieve this, the HERN employs two parallel paths to learn image features in two different resolutions, respectively. By combining global-aware features and multi-scale features, our HERN is able to learn global information with feasible GPU memory usage. Besides, we introduce a progressive training method to solve the instability issue and accelerate model convergence. On the task of image demosaicing and enhancing, our HERN achieves state-of-the-art performance on the AIM2019 RAW to RGB mapping challenge. The source code of our implementation is available at https://github.com/MKFMIKU/RAW2RGBNet.
Tasks Demosaicking, Image Restoration
Published 2019-11-19
URL https://arxiv.org/abs/1911.08098v1
PDF https://arxiv.org/pdf/1911.08098v1.pdf
PWC https://paperswithcode.com/paper/higher-resolution-network-for-image
Repo https://github.com/MKFMIKU/RAW2RGBNet
Framework pytorch

A Unified Framework for Structured Graph Learning via Spectral Constraints

Title A Unified Framework for Structured Graph Learning via Spectral Constraints
Authors Sandeep Kumar, Jiaxi Ying, José Vinícius de M. Cardoso, Daniel Palomar
Abstract Graph learning from data represents a canonical problem that has received substantial attention in the literature. However, insufficient work has been done in incorporating prior structural knowledge onto the learning of underlying graphical models from data. Learning a graph with a specific structure is essential for interpretability and identification of the relationships among data. Useful structured graphs include the multi-component graph, bipartite graph, connected graph, sparse graph, and regular graph. In general, structured graph learning is an NP-hard combinatorial problem, therefore, designing a general tractable optimization method is extremely challenging. In this paper, we introduce a unified graph learning framework lying at the integration of Gaussian graphical models and spectral graph theory. To impose a particular structure on a graph, we first show how to formulate the combinatorial constraints as an analytical property of the graph matrix. Then we develop an optimization framework that leverages graph learning with specific structures via spectral constraints on graph matrices. The proposed algorithms are provably convergent, computationally efficient, and practically amenable for numerous graph-based tasks. Extensive numerical experiments with both synthetic and real data sets illustrate the effectiveness of the proposed algorithms. The code for all the simulations is made available as an open source repository.
Tasks
Published 2019-04-22
URL http://arxiv.org/abs/1904.09792v1
PDF http://arxiv.org/pdf/1904.09792v1.pdf
PWC https://paperswithcode.com/paper/a-unified-framework-for-structured-graph
Repo https://github.com/cran/spectralGraphTopology
Framework none

Lending Orientation to Neural Networks for Cross-view Geo-localization

Title Lending Orientation to Neural Networks for Cross-view Geo-localization
Authors Liu Liu, Hongdong Li
Abstract This paper studies image-based geo-localization (IBL) problem using ground-to-aerial cross-view matching. The goal is to predict the spatial location of a ground-level query image by matching it to a large geotagged aerial image database (e.g., satellite imagery). This is a challenging task due to the drastic differences in their viewpoints and visual appearances. Existing deep learning methods for this problem have been focused on maximizing feature similarity between spatially close-by image pairs, while minimizing other images pairs which are far apart. They do so by deep feature embedding based on visual appearance in those ground-and-aerial images. However, in everyday life, humans commonly use {\em orientation} information as an important cue for the task of spatial localization. Inspired by this insight, this paper proposes a novel method which endows deep neural networks with the `commonsense’ of orientation. Given a ground-level spherical panoramic image as query input (and a large georeferenced satellite image database), we design a Siamese network which explicitly encodes the orientation (i.e., spherical directions) of each pixel of the images. Our method significantly boosts the discriminative power of the learned deep features, leading to a much higher recall and precision outperforming all previous methods. Our network is also more compact using only 1/5th number of parameters than a previously best-performing network. To evaluate the generalization of our method, we also created a large-scale cross-view localization benchmark containing 100K geotagged ground-aerial pairs covering a city. Our codes and datasets are available at \url{https://github.com/Liumouliu/OriCNN}. |
Tasks
Published 2019-03-29
URL http://arxiv.org/abs/1903.12351v1
PDF http://arxiv.org/pdf/1903.12351v1.pdf
PWC https://paperswithcode.com/paper/lending-orientation-to-neural-networks-for
Repo https://github.com/Liumouliu/OriCNN
Framework tf

SSN: Learning Sparse Switchable Normalization via SparsestMax

Title SSN: Learning Sparse Switchable Normalization via SparsestMax
Authors Wenqi Shao, Tianjian Meng, Jingyu Li, Ruimao Zhang, Yudian Li, Xiaogang Wang, Ping Luo
Abstract Normalization methods improve both optimization and generalization of ConvNets. To further boost performance, the recently-proposed switchable normalization (SN) provides a new perspective for deep learning: it learns to select different normalizers for different convolution layers of a ConvNet. However, SN uses softmax function to learn importance ratios to combine normalizers, leading to redundant computations compared to a single normalizer. This work addresses this issue by presenting Sparse Switchable Normalization (SSN) where the importance ratios are constrained to be sparse. Unlike $\ell_1$ and $\ell_0$ constraints that impose difficulties in optimization, we turn this constrained optimization problem into feed-forward computation by proposing SparsestMax, which is a sparse version of softmax. SSN has several appealing properties. (1) It inherits all benefits from SN such as applicability in various tasks and robustness to a wide range of batch sizes. (2) It is guaranteed to select only one normalizer for each normalization layer, avoiding redundant computations. (3) SSN can be transferred to various tasks in an end-to-end manner. Extensive experiments show that SSN outperforms its counterparts on various challenging benchmarks such as ImageNet, Cityscapes, ADE20K, and Kinetics.
Tasks
Published 2019-03-09
URL http://arxiv.org/abs/1903.03793v1
PDF http://arxiv.org/pdf/1903.03793v1.pdf
PWC https://paperswithcode.com/paper/ssn-learning-sparse-switchable-normalization
Repo https://github.com/switchablenorms/Sparse_SwitchNorm
Framework pytorch

Clotho: An Audio Captioning Dataset

Title Clotho: An Audio Captioning Dataset
Authors Konstantinos Drossos, Samuel Lipping, Tuomas Virtanen
Abstract Audio captioning is the novel task of general audio content description using free text. It is an intermodal translation task (not speech-to-text), where a system accepts as an input an audio signal and outputs the textual description (i.e. the caption) of that signal. In this paper we present Clotho, a dataset for audio captioning consisting of 4981 audio samples of 15 to 30 seconds duration and 24 905 captions of eight to 20 words length, and a baseline method to provide initial results. Clotho is built with focus on audio content and caption diversity, and the splits of the data are not hampering the training or evaluation of methods. All sounds are from the Freesound platform, and captions are crowdsourced using Amazon Mechanical Turk and annotators from English speaking countries. Unique words, named entities, and speech transcription are removed with post-processing. Clotho is freely available online (https://zenodo.org/record/3490684).
Tasks
Published 2019-10-21
URL https://arxiv.org/abs/1910.09387v1
PDF https://arxiv.org/pdf/1910.09387v1.pdf
PWC https://paperswithcode.com/paper/clotho-an-audio-captioning-dataset
Repo https://github.com/dr-costas/clotho-baseline-dataset
Framework none
comments powered by Disqus