January 28, 2020

2778 words 14 mins read

Paper Group ANR 970

Paper Group ANR 970

Approximation capabilities of neural networks on unbounded domains. Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals. Hybrid Deep Network for Anomaly Detection. Attention-Aware Linear Depthwise Convolution for Single Image Super-Resolution. How to Pick the Best Source Data? Measuring Transferability for Heterogeneous D …

Approximation capabilities of neural networks on unbounded domains

Title Approximation capabilities of neural networks on unbounded domains
Authors Ming-Xi Wang, Yang Qu
Abstract We prove that if $p \in [2, \infty)$ and if the activation function is a monotone sigmoid, relu, elu, softplus or leaky relu, then the shallow neural network is a universal approximator in $L^{p}(\mathbb{R} \times [0, 1]^n)$. This generalizes classical universal approximation theorems on $[0,1]^n.$ We also prove that if $p \in [1, \infty)$ and if the activation function is a sigmoid, relu, elu, softplus or leaky relu, then the shallow neural network expresses no non-zero functions in $L^{p}(\mathbb{R} \times \mathbb{R}^+)$. Consequently a shallow relu network expresses no non-zero functions in $L^{p}(\mathbb{R}^n)(n \ge 2)$. Some authors, on the other hand, have showed that deep relu network is a universal approximator in $L^{p}(\mathbb{R}^n)$. Together we obtained a qualitative viewpoint which justifies the benefit of depth in the context of relu networks.
Tasks
Published 2019-10-21
URL https://arxiv.org/abs/1910.09293v5
PDF https://arxiv.org/pdf/1910.09293v5.pdf
PWC https://paperswithcode.com/paper/approximation-capabilities-of-neural-networks
Repo
Framework

Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals

Title Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals
Authors Surbhi Goel, Sushrut Karmalkar, Adam Klivans
Abstract We consider the problem of computing the best-fitting ReLU with respect to square-loss on a training set when the examples have been drawn according to a spherical Gaussian distribution (the labels can be arbitrary). Let $\mathsf{opt} < 1$ be the population loss of the best-fitting ReLU. We prove: 1. Finding a ReLU with square-loss $\mathsf{opt} + \epsilon$ is as hard as the problem of learning sparse parities with noise, widely thought to be computationally intractable. This is the first hardness result for learning a ReLU with respect to Gaussian marginals, and our results imply -{\emph unconditionally}- that gradient descent cannot converge to the global minimum in polynomial time. 2. There exists an efficient approximation algorithm for finding the best-fitting ReLU that achieves error $O(\mathsf{opt}^{2/3})$. The algorithm uses a novel reduction to noisy halfspace learning with respect to $0/1$ loss. Prior work due to Soltanolkotabi [Sol17] showed that gradient descent can find the best-fitting ReLU with respect to Gaussian marginals, if the training set is exactly labeled by a ReLU.
Tasks
Published 2019-11-04
URL https://arxiv.org/abs/1911.01462v1
PDF https://arxiv.org/pdf/1911.01462v1.pdf
PWC https://paperswithcode.com/paper/timeaccuracy-tradeoffs-for-learning-a-relu
Repo
Framework

Hybrid Deep Network for Anomaly Detection

Title Hybrid Deep Network for Anomaly Detection
Authors Trong Nguyen Nguyen, Jean Meunier
Abstract In this paper, we propose a deep convolutional neural network (CNN) for anomaly detection in surveillance videos. The model is adapted from a typical auto-encoder working on video patches under the perspective of sparse combination learning. Our CNN focuses on (unsupervisedly) learning common characteristics of normal events with the emphasis of their spatial locations (by supervised losses). To our knowledge, this is the first work that directly adapts the patch position as the target of a classification sub-network. The model is capable to provide a score of anomaly assessment for each video frame. Our experiments were performed on 4 benchmark datasets with various anomalous events and the obtained results were competitive with state-of-the-art studies.
Tasks Anomaly Detection, Anomaly Detection In Surveillance Videos
Published 2019-08-17
URL https://arxiv.org/abs/1908.06347v1
PDF https://arxiv.org/pdf/1908.06347v1.pdf
PWC https://paperswithcode.com/paper/hybrid-deep-network-for-anomaly-detection
Repo
Framework

Attention-Aware Linear Depthwise Convolution for Single Image Super-Resolution

Title Attention-Aware Linear Depthwise Convolution for Single Image Super-Resolution
Authors Seongmin Hwang, Gwanghuyn Yu, Cheolkon Jung, Jinyoung Kim
Abstract Although deep convolutional neural networks (CNNs) have obtained outstanding performance in image superresolution (SR), their computational cost increases geometrically as CNN models get deeper and wider. Meanwhile, the features of intermediate layers are treated equally across the channel, thus hindering the representational capability of CNNs. In this paper, we propose an attention-aware linear depthwise network to address the problems for single image SR, named ALDNet. Specifically, linear depthwise convolution allows CNN-based SR models to preserve useful information for reconstructing a super-resolved image while reducing computational burden. Furthermore, we design an attention-aware branch that enhances the representation ability of depthwise convolution layers by making full use of depthwise filter interdependency. Experiments on publicly available benchmark datasets show that ALDNet achieves superior performance to traditional depthwise separable convolutions in terms of quantitative measurements and visual quality.
Tasks Image Super-Resolution, Super-Resolution
Published 2019-08-07
URL https://arxiv.org/abs/1908.02648v3
PDF https://arxiv.org/pdf/1908.02648v3.pdf
PWC https://paperswithcode.com/paper/linear-depthwise-convolution-for-single-image
Repo
Framework

How to Pick the Best Source Data? Measuring Transferability for Heterogeneous Domains

Title How to Pick the Best Source Data? Measuring Transferability for Heterogeneous Domains
Authors Seungcheol Park, Huiwen Xu, Taehun Kim, Inhwan Hwang, Kyung-Jun Kim, U Kang
Abstract Given a set of source data with pre-trained classification models, how can we fast and accurately select the most useful source data to improve the performance of a target task? We address the problem of measuring transferability for heterogeneous domains, where the source and the target data have different feature spaces and distributions. We propose Transmeter, a novel method to efficiently and accurately measure transferability of two datasets. Transmeter utilizes a pre-trained source classifier and a reconstruction loss to increase its efficiency and performance. Furthermore, Transmeter uses feature transformation layers, label-wise discriminators, and a mean distance loss to learn common representations for source and target domains. As a result, Transmeter and its variant give the most accurate performance in measuring transferability, while giving comparable running times compared to those of competitors.
Tasks
Published 2019-12-23
URL https://arxiv.org/abs/1912.13366v1
PDF https://arxiv.org/pdf/1912.13366v1.pdf
PWC https://paperswithcode.com/paper/how-to-pick-the-best-source-data-measuring
Repo
Framework

A Survey of Code-switched Speech and Language Processing

Title A Survey of Code-switched Speech and Language Processing
Authors Sunayana Sitaram, Khyathi Raghavi Chandu, Sai Krishna Rallabandi, Alan W Black
Abstract Code-switching, the alternation of languages within a conversation or utterance, is a common communicative phenomenon that occurs in multilingual communities across the world. This survey reviews computational approaches for code-switched Speech and Natural Language Processing. We motivate why processing code-switched text and speech is essential for building intelligent agents and systems that interact with users in multilingual communities. As code-switching data and resources are scarce, we list what is available in various code-switched language pairs with the language processing tasks they can be used for. We review code-switching research in various Speech and NLP applications, including language processing tools and end-to-end systems. We conclude with future directions and open problems in the field.
Tasks
Published 2019-03-25
URL http://arxiv.org/abs/1904.00784v2
PDF http://arxiv.org/pdf/1904.00784v2.pdf
PWC https://paperswithcode.com/paper/a-survey-of-code-switched-speech-and-language
Repo
Framework

Audio2Face: Generating Speech/Face Animation from Single Audio with Attention-Based Bidirectional LSTM Networks

Title Audio2Face: Generating Speech/Face Animation from Single Audio with Attention-Based Bidirectional LSTM Networks
Authors Guanzhong Tian, Yi Yuan, Yong liu
Abstract We propose an end to end deep learning approach for generating real-time facial animation from just audio. Specifically, our deep architecture employs deep bidirectional long short-term memory network and attention mechanism to discover the latent representations of time-varying contextual information within the speech and recognize the significance of different information contributed to certain face status. Therefore, our model is able to drive different levels of facial movements at inference and automatically keep up with the corresponding pitch and latent speaking style in the input audio, with no assumption or further human intervention. Evaluation results show that our method could not only generate accurate lip movements from audio, but also successfully regress the speaker’s time-varying facial movements.
Tasks
Published 2019-05-27
URL https://arxiv.org/abs/1905.11142v1
PDF https://arxiv.org/pdf/1905.11142v1.pdf
PWC https://paperswithcode.com/paper/audio2face-generating-speechface-animation
Repo
Framework

STGRAT: A Spatio-Temporal Graph Attention Network for Traffic Forecasting

Title STGRAT: A Spatio-Temporal Graph Attention Network for Traffic Forecasting
Authors Cheonbok Park, Chunggi Lee, Hyojin Bahng, Taeyun won, Kihwan Kim, Seungmin Jin, Sungahn Ko, Jaegul Choo
Abstract Predicting the road traffic speed is a challenging task due to different types of roads, abrupt speed changes, and spatial dependencies between roads, which requires the modeling of dynamically changing spatial dependencies among roads and temporal patterns over long input sequences. This paper proposes a novel Spatio-Temporal Graph Attention (STGRAT) that effectively captures the spatio-temporal dynamics in road networks. The features of our approach mainly include spatial attention, temporal attention, and spatial sentinel vectors. The spatial attention takes the graph structure information (e.g., distance between roads) and dynamically adjusts spatial correlation based on road states. The temporal attention is responsible for capturing traffic speed changes, while the sentinel vectors allow the model to retrieve new features from spatially correlated nodes or preserve existing features. The experimental results show that STGRAT outperforms existing models, especially in difficult conditions where traffic speeds rapidly change (e.g., rush hours). We additionally provide a qualitative study to analyze when and where STGRAT mainly attended to make accurate predictions during a rush-hour time.
Tasks
Published 2019-11-29
URL https://arxiv.org/abs/1911.13181v1
PDF https://arxiv.org/pdf/1911.13181v1.pdf
PWC https://paperswithcode.com/paper/stgrat-a-spatio-temporal-graph-attention
Repo
Framework

DDNet: Dual-path Decoder Network for Occlusion Relationship Reasoning

Title DDNet: Dual-path Decoder Network for Occlusion Relationship Reasoning
Authors Panhe Feng, Xuejing Kang, Lizhu Ye, Lei Zhu, Chunpeng Li, Anlong Ming
Abstract Occlusion relationship reasoning based on convolution neural networks consists of two subtasks: occlusion boundary extraction and occlusion orientation inference. Due to the essential differences between the two subtasks in the feature expression at the higher and lower stages, it is challenging to carry on them simultaneously in one network. To address this issue, we propose a novel Dual-path Decoder Network, which uniformly extracts occlusion information at higher stages and separates into two paths to recover boundary and occlusion orientation respectively in lower stages. Besides, considering the restriction of occlusion orientation presentation to occlusion orientation learning, we design a new orthogonal representation for occlusion orientation and proposed the Orthogonal Orientation Regression loss which can get rid of the unfitness between occlusion representation and learning and further prompt the occlusion orientation learning. Finally, we apply a multi-scale loss together with our proposed orientation regression loss to guide the boundary and orientation path learning respectively. Experiments demonstrate that our proposed method achieves state-of-the-art results on PIOD and BSDS ownership datasets.
Tasks
Published 2019-11-26
URL https://arxiv.org/abs/1911.11582v1
PDF https://arxiv.org/pdf/1911.11582v1.pdf
PWC https://paperswithcode.com/paper/ddnet-dual-path-decoder-network-for-occlusion
Repo
Framework

High Resolution Medical Image Analysis with Spatial Partitioning

Title High Resolution Medical Image Analysis with Spatial Partitioning
Authors Le Hou, Youlong Cheng, Noam Shazeer, Niki Parmar, Yeqing Li, Panagiotis Korfiatis, Travis M. Drucker, Daniel J. Blezek, Xiaodan Song
Abstract Medical images such as 3D computerized tomography (CT) scans and pathology images, have hundreds of millions or billions of voxels/pixels. It is infeasible to train CNN models directly on such high resolution images, because neural activations of a single image do not fit in the memory of a single GPU/TPU, and naive data and model parallelism approaches do not work. Existing image analysis approaches alleviate this problem by cropping or down-sampling input images, which leads to complicated implementation and sub-optimal performance due to information loss. In this paper, we implement spatial partitioning, which internally distributes the input and output of convolutional layers across GPUs/TPUs. Our implementation is based on the Mesh-TensorFlow framework and the computation distribution is transparent to end users. With this technique, we train a 3D Unet on up to 512 by 512 by 512 resolution data. To the best of our knowledge, this is the first work for handling such high resolution images end-to-end.
Tasks
Published 2019-09-06
URL https://arxiv.org/abs/1909.03108v3
PDF https://arxiv.org/pdf/1909.03108v3.pdf
PWC https://paperswithcode.com/paper/high-resolution-medical-image-analysis-with
Repo
Framework

Occlusion Robust Face Recognition Based on Mask Learning with PairwiseDifferential Siamese Network

Title Occlusion Robust Face Recognition Based on Mask Learning with PairwiseDifferential Siamese Network
Authors Lingxue Song, Dihong Gong, Zhifeng Li, Changsong Liu, Wei Liu
Abstract Deep Convolutional Neural Networks (CNNs) have been pushing the frontier of the face recognition research in the past years. However, existing general CNN face models generalize poorly to the scenario of occlusions on variable facial areas. Inspired by the fact that a human visual system explicitly ignores occlusions and only focuses on non-occluded facial areas, we propose a mask learning strategy to find and discard the corrupted feature elements for face recognition. A mask dictionary is firstly established by exploiting the differences between the top convoluted features of occluded and occlusion-free face pairs using an innovatively designed Pairwise Differential Siamese Network (PDSN). Each item of this dictionary captures the correspondence between occluded facial areas and corrupted feature elements, which is named Feature Discarding Mask (FDM). When dealing with a face image with random partial occlusions, we generate its FDM by combining relevant dictionary items and then multiply it with the original features to eliminate those corrupted feature elements. Comprehensive experiments on both synthesized and realistic occluded face datasets show that the proposed approach significantly outperforms the state-of-the-arts.
Tasks Face Recognition, Robust Face Recognition
Published 2019-08-17
URL https://arxiv.org/abs/1908.06290v1
PDF https://arxiv.org/pdf/1908.06290v1.pdf
PWC https://paperswithcode.com/paper/occlusion-robust-face-recognition-based-on
Repo
Framework

Artificial Intelligence in Surgery

Title Artificial Intelligence in Surgery
Authors Xiao-Yun Zhou, Yao Guo, Mali Shen, Guang-Zhong Yang
Abstract Artificial Intelligence (AI) is gradually changing the practice of surgery with the advanced technological development of imaging, navigation and robotic intervention. In this article, the recent successful and influential applications of AI in surgery are reviewed from pre-operative planning and intra-operative guidance to the integration of surgical robots. We end with summarizing the current state, emerging trends and major challenges in the future development of AI in surgery.
Tasks
Published 2019-12-23
URL https://arxiv.org/abs/2001.00627v1
PDF https://arxiv.org/pdf/2001.00627v1.pdf
PWC https://paperswithcode.com/paper/artificial-intelligence-in-surgery
Repo
Framework

Single Image Reflection Removal through Cascaded Refinement

Title Single Image Reflection Removal through Cascaded Refinement
Authors Chao Li, Yixiao Yang, Kun He, Stephen Lin, John E. Hopcroft
Abstract We address the problem of removing undesirable reflections from a single image captured through a glass surface, which is an ill-posed, challenging but practically important problem for photo enhancement. Inspired by iterative structure reduction for hidden community detection in social networks, we propose an Iterative Boost Convolutional LSTM Network (IBCLN) that enables cascaded prediction for reflection removal. IBCLN iteratively refines estimates of the transmission and reflection layers at each step in a manner that they can boost the prediction quality for each other. The intuition is that progressive refinement of the transmission or reflection layer is aided by increasingly better estimates of these quantities as input, and that transmission and reflection are complementary to each other in a single image and thus provide helpful auxiliary information for each other’s prediction. To facilitate training over multiple cascade steps, we employ LSTM to address the vanishing gradient problem, and incorporate a reconstruction loss as further training guidance at each step. In addition, we create a dataset of real-world images with reflection and ground-truth transmission layers to mitigate the problem of insufficient data. Through comprehensive experiments, IBCLN demonstrates performance that surpasses state-of-the-art reflection removal methods.
Tasks Community Detection
Published 2019-11-15
URL https://arxiv.org/abs/1911.06634v1
PDF https://arxiv.org/pdf/1911.06634v1.pdf
PWC https://paperswithcode.com/paper/single-image-reflection-removal-through
Repo
Framework

Efficient Convolutional Neural Networks for Diacritic Restoration

Title Efficient Convolutional Neural Networks for Diacritic Restoration
Authors Sawsan Alqahtani, Ajay Mishra, Mona Diab
Abstract Diacritic restoration has gained importance with the growing need for machines to understand written texts. The task is typically modeled as a sequence labeling problem and currently Bidirectional Long Short Term Memory (BiLSTM) models provide state-of-the-art results. Recently, Bai et al. (2018) show the advantages of Temporal Convolutional Neural Networks (TCN) over Recurrent Neural Networks (RNN) for sequence modeling in terms of performance and computational resources. As diacritic restoration benefits from both previous as well as subsequent timesteps, we further apply and evaluate a variant of TCN, Acausal TCN (A-TCN), which incorporates context from both directions (previous and future) rather than strictly incorporating previous context as in the case of TCN. A-TCN yields significant improvement over TCN for diacritization in three different languages: Arabic, Yoruba, and Vietnamese. Furthermore, A-TCN and BiLSTM have comparable performance, making A-TCN an efficient alternative over BiLSTM since convolutions can be trained in parallel. A-TCN is significantly faster than BiLSTM at inference time (270%-334% improvement in the amount of text diacritized per minute).
Tasks
Published 2019-12-14
URL https://arxiv.org/abs/1912.06900v1
PDF https://arxiv.org/pdf/1912.06900v1.pdf
PWC https://paperswithcode.com/paper/efficient-convolutional-neural-networks-for-3
Repo
Framework

Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization

Title Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization
Authors Mozhi Zhang, Keyulu Xu, Ken-ichi Kawarabayashi, Stefanie Jegelka, Jordan Boyd-Graber
Abstract Cross-lingual word embeddings (CLWE) underlie many multilingual natural language processing systems, often through orthogonal transformations of pre-trained monolingual embeddings. However, orthogonal mapping only works on language pairs whose embeddings are naturally isomorphic. For non-isomorphic pairs, our method (Iterative Normalization) transforms monolingual embeddings to make orthogonal alignment easier by simultaneously enforcing that (1) individual word vectors are unit length, and (2) each language’s average vector is zero. Iterative Normalization consistently improves word translation accuracy of three CLWE methods, with the largest improvement observed on English-Japanese (from 2% to 44% test accuracy).
Tasks Word Embeddings
Published 2019-06-04
URL https://arxiv.org/abs/1906.01622v3
PDF https://arxiv.org/pdf/1906.01622v3.pdf
PWC https://paperswithcode.com/paper/are-girls-neko-or-shojo-cross-lingual
Repo
Framework
comments powered by Disqus