February 1, 2020

3235 words 16 mins read

Paper Group AWR 159

Paper Group AWR 159

EventGAN: Leveraging Large Scale Image Datasets for Event Cameras. Relationships from Entity Stream. Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet. On the Benefit of Adversarial Training for Monocular Depth Estimation. Y-Net: A Hybrid Deep Learning Reconstruction Framework for Photoacoustic Imaging in vivo …

EventGAN: Leveraging Large Scale Image Datasets for Event Cameras

Title EventGAN: Leveraging Large Scale Image Datasets for Event Cameras
Authors Alex Zihao Zhu, Ziyun Wang, Kaung Khant, Kostas Daniilidis
Abstract Event cameras provide a number of benefits over traditional cameras, such as the ability to track incredibly fast motions, high dynamic range, and low power consumption. However, their application into computer vision problems, many of which are primarily dominated by deep learning solutions, has been limited by the lack of labeled training data for events. In this work, we propose a method which leverages the existing labeled data for images by simulating events from a pair of temporal image frames, using a convolutional neural network. We train this network on pairs of images and events, using an adversarial discriminator loss and a pair of cycle consistency losses. The cycle consistency losses utilize a pair of pre-trained self-supervised networks which perform optical flow estimation and image reconstruction from events, and constrain our network to generate events which result in accurate outputs from both of these networks. Trained fully end to end, our network learns a generative model for events from images without the need for accurate modeling of the motion in the scene, exhibited by modeling based methods, while also implicitly modeling event noise. Using this simulator, we train a pair of downstream networks on object detection and 2D human pose estimation from events, using simulated data from large scale image datasets, and demonstrate the networks’ abilities to generalize to datasets with real events.
Tasks Image Reconstruction, Object Detection, Optical Flow Estimation, Pose Estimation
Published 2019-12-03
URL https://arxiv.org/abs/1912.01584v2
PDF https://arxiv.org/pdf/1912.01584v2.pdf
PWC https://paperswithcode.com/paper/eventgan-leveraging-large-scale-image
Repo https://github.com/alexzzhu/EventGAN
Framework pytorch

Relationships from Entity Stream

Title Relationships from Entity Stream
Authors Martin Andrews, Sam Witteveen
Abstract Relational reasoning is a central component of intelligent behavior, but has proven difficult for neural networks to learn. The Relation Network (RN) module was recently proposed by DeepMind to solve such problems, and demonstrated state-of-the-art results on a number of datasets. However, the RN module scales quadratically in the size of the input, since it calculates relationship factors between every patch in the visual field, including those that do not correspond to entities. In this paper, we describe an architecture that enables relationships to be determined from a stream of entities obtained by an attention mechanism over the input field. The model is trained end-to-end, and demonstrates equivalent performance with greater interpretability while requiring only a fraction of the model parameters of the original RN module.
Tasks Relational Reasoning
Published 2019-09-07
URL https://arxiv.org/abs/1909.03315v1
PDF https://arxiv.org/pdf/1909.03315v1.pdf
PWC https://paperswithcode.com/paper/relationships-from-entity-stream
Repo https://github.com/mdda/relationships-from-entity-stream
Framework pytorch

Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet

Title Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet
Authors Wieland Brendel, Matthias Bethge
Abstract Deep Neural Networks (DNNs) excel on many complex perceptual tasks but it has proven notoriously difficult to understand how they reach their decisions. We here introduce a high-performance DNN architecture on ImageNet whose decisions are considerably easier to explain. Our model, a simple variant of the ResNet-50 architecture called BagNet, classifies an image based on the occurrences of small local image features without taking into account their spatial ordering. This strategy is closely related to the bag-of-feature (BoF) models popular before the onset of deep learning and reaches a surprisingly high accuracy on ImageNet (87.6% top-5 for 33 x 33 px features and Alexnet performance for 17 x 17 px features). The constraint on local features makes it straight-forward to analyse how exactly each part of the image influences the classification. Furthermore, the BagNets behave similar to state-of-the art deep neural networks such as VGG-16, ResNet-152 or DenseNet-169 in terms of feature sensitivity, error distribution and interactions between image parts. This suggests that the improvements of DNNs over previous bag-of-feature classifiers in the last few years is mostly achieved by better fine-tuning rather than by qualitatively different decision strategies.
Tasks
Published 2019-03-20
URL http://arxiv.org/abs/1904.00760v1
PDF http://arxiv.org/pdf/1904.00760v1.pdf
PWC https://paperswithcode.com/paper/approximating-cnns-with-bag-of-local-features-1
Repo https://github.com/rui-yan/CS229-final-project
Framework pytorch

On the Benefit of Adversarial Training for Monocular Depth Estimation

Title On the Benefit of Adversarial Training for Monocular Depth Estimation
Authors Rick Groenendijk, Sezer Karaoglu, Theo Gevers, Thomas Mensink
Abstract In this paper we address the benefit of adding adversarial training to the task of monocular depth estimation. A model can be trained in a self-supervised setting on stereo pairs of images, where depth (disparities) are an intermediate result in a right-to-left image reconstruction pipeline. For the quality of the image reconstruction and disparity prediction, a combination of different losses is used, including L1 image reconstruction losses and left-right disparity smoothness. These are local pixel-wise losses, while depth prediction requires global consistency. Therefore, we extend the self-supervised network to become a Generative Adversarial Network (GAN), by including a discriminator which should tell apart reconstructed (fake) images from real images. We evaluate Vanilla GANs, LSGANs and Wasserstein GANs in combination with different pixel-wise reconstruction losses. Based on extensive experimental evaluation, we conclude that adversarial training is beneficial if and only if the reconstruction loss is not too constrained. Even though adversarial training seems promising because it promotes global consistency, non-adversarial training outperforms (or is on par with) any method trained with a GAN when a constrained reconstruction loss is used in combination with batch normalisation. Based on the insights of our experimental evaluation we obtain state-of-the art monocular depth estimation results by using batch normalisation and different output scales.
Tasks
Published 2019-10-29
URL https://arxiv.org/abs/1910.13340v1
PDF https://arxiv.org/pdf/1910.13340v1.pdf
PWC https://paperswithcode.com/paper/on-the-benefit-of-adversarial-training-for
Repo https://github.com/rickgroen/depthgan
Framework pytorch

Y-Net: A Hybrid Deep Learning Reconstruction Framework for Photoacoustic Imaging in vivo

Title Y-Net: A Hybrid Deep Learning Reconstruction Framework for Photoacoustic Imaging in vivo
Authors Hengrong Lan, Daohuai Jiang, Changchun Yang, Fei Gao
Abstract Photoacoustic imaging (PAI) is an emerging non-invasive imaging modality combining the advantages of deep ultrasound penetration and high optical contrast. Image reconstruction is an essential topic in PAI, which is unfortunately an ill-posed problem due to the complex and unknown optical/acoustic parameters in tissue. Conventional algorithms used in PAI (e.g., delay-and-sum) provide a fast solution while many artifacts remain, especially for linear array probe with limited-view issue. Convolutional neural network (CNN) has shown state-of-the-art results in computer vision, and more and more work based on CNN has been studied in medical image processing recently. In this paper, we present a non-iterative scheme filling the gap between existing direct-processing and post-processing methods, and propose a new framework Y-Net: a CNN architecture to reconstruct the PA image by optimizing both raw data and beamformed images once. The network connected two encoders with one decoder path, which optimally utilizes more information from raw data and beamformed image. The results of the test set showed good performance compared with conventional reconstruction algorithms and other deep learning methods. Our method is also validated with experiments both in-vitro and in vivo, which still performs better than other existing methods. The proposed Y-Net architecture also has high potential in medical image reconstruction for other imaging modalities beyond PAI.
Tasks Image Reconstruction
Published 2019-08-02
URL https://arxiv.org/abs/1908.00975v1
PDF https://arxiv.org/pdf/1908.00975v1.pdf
PWC https://paperswithcode.com/paper/y-net-a-hybrid-deep-learning-reconstruction
Repo https://github.com/chenyilan/Y-Net
Framework pytorch

Easy Transfer Learning By Exploiting Intra-domain Structures

Title Easy Transfer Learning By Exploiting Intra-domain Structures
Authors Jindong Wang, Yiqiang Chen, Han Yu, Meiyu Huang, Qiang Yang
Abstract Transfer learning aims at transferring knowledge from a well-labeled domain to a similar but different domain with limited or no labels. Unfortunately, existing learning-based methods often involve intensive model selection and hyperparameter tuning to obtain good results. Moreover, cross-validation is not possible for tuning hyperparameters since there are often no labels in the target domain. This would restrict wide applicability of transfer learning especially in computationally-constraint devices such as wearables. In this paper, we propose a practically Easy Transfer Learning (EasyTL) approach which requires no model selection and hyperparameter tuning, while achieving competitive performance. By exploiting intra-domain structures, EasyTL is able to learn both non-parametric transfer features and classifiers. Extensive experiments demonstrate that, compared to state-of-the-art traditional and deep methods, EasyTL satisfies the Occam’s Razor principle: it is extremely easy to implement and use while achieving comparable or better performance in classification accuracy and much better computational efficiency. Additionally, it is shown that EasyTL can increase the performance of existing transfer feature learning methods.
Tasks Model Selection, Transfer Learning
Published 2019-04-02
URL http://arxiv.org/abs/1904.01376v2
PDF http://arxiv.org/pdf/1904.01376v2.pdf
PWC https://paperswithcode.com/paper/easy-transfer-learning-by-exploiting-intra
Repo https://github.com/jindongwang/transferlearning
Framework pytorch

On the Downstream Performance of Compressed Word Embeddings

Title On the Downstream Performance of Compressed Word Embeddings
Authors Avner May, Jian Zhang, Tri Dao, Christopher Ré
Abstract Compressing word embeddings is important for deploying NLP models in memory-constrained settings. However, understanding what makes compressed embeddings perform well on downstream tasks is challenging—existing measures of compression quality often fail to distinguish between embeddings that perform well and those that do not. We thus propose the eigenspace overlap score as a new measure. We relate the eigenspace overlap score to downstream performance by developing generalization bounds for the compressed embeddings in terms of this score, in the context of linear and logistic regression. We then show that we can lower bound the eigenspace overlap score for a simple uniform quantization compression method, helping to explain the strong empirical performance of this method. Finally, we show that by using the eigenspace overlap score as a selection criterion between embeddings drawn from a representative set we compressed, we can efficiently identify the better performing embedding with up to $2\times$ lower selection error rates than the next best measure of compression quality, and avoid the cost of training a model for each task of interest.
Tasks Quantization, Word Embeddings
Published 2019-09-03
URL https://arxiv.org/abs/1909.01264v2
PDF https://arxiv.org/pdf/1909.01264v2.pdf
PWC https://paperswithcode.com/paper/on-the-downstream-performance-of-compressed
Repo https://github.com/HazyResearch/smallfry
Framework pytorch

FALCON 2.0: An Entity and Relation Linking Tool over Wikidata

Title FALCON 2.0: An Entity and Relation Linking Tool over Wikidata
Authors Ahmad Sakor, Kuldeep Singh, Anery Patel, Maria-Esther Vidal
Abstract Natural Language Processing (NLP) tools and frameworks have significantly contributed with solutions to the problems of extracting entities and relations and linking them to the related knowledge graphs. Albeit effective, the majority of existing tools are available for only one knowledge graph. In this paper, we present Falcon 2.0, a rule-based tool capable of accurately mapping entities and relations in short texts to resources in both DBpedia and Wikidata following the same approach in both cases. The input of Falcon 2.0 is a short natural language text in the English language. Falcon 2.0 resorts to fundamental principles of the English morphology (e.g., N-Gram tiling and N-Gram splitting) and background knowledge of labels alignments obtained from studied knowledge graph to return as an output; the resulting entity and relation resources are either in the DBpedia or Wikidata knowledge graphs. We have empirically studied the impact using only Wikidata on Falcon 2.0, and observed it is knowledge graph agnostic, i.e., Falcon 2.0 performance and behavior are not affected by the knowledge graph used as background knowledge. Falcon 2.0 is public and can be reused by the community. Additionally, Falcon 2.0 and its background knowledge bases are available as resources at https://labs.tib.eu/falcon/falcon2/.
Tasks Knowledge Graphs
Published 2019-12-24
URL https://arxiv.org/abs/1912.11270v4
PDF https://arxiv.org/pdf/1912.11270v4.pdf
PWC https://paperswithcode.com/paper/falcon-20-an-entity-and-relation-linking
Repo https://github.com/SDM-TIB/Falcon2.0
Framework none

PULP-NN: Accelerating Quantized Neural Networks on Parallel Ultra-Low-Power RISC-V Processors

Title PULP-NN: Accelerating Quantized Neural Networks on Parallel Ultra-Low-Power RISC-V Processors
Authors Angelo Garofalo, Manuele Rusci, Francesco Conti, Davide Rossi, Luca Benini
Abstract We present PULP-NN, an optimized computing library for a parallel ultra-low-power tightly coupled cluster of RISC-V processors. The key innovation in PULP-NN is a set of kernels for Quantized Neural Network (QNN) inference, targeting byte and sub-byte data types, down to INT-1, tuned for the recent trend toward aggressive quantization in deep neural network inference. The proposed library exploits both the digital signal processing (DSP) extensions available in the PULP RISC-V processors and the cluster’s parallelism, achieving up to 15.5 MACs/cycle on INT-8 and improving performance by up to 63x with respect to a sequential implementation on a single RISC-V core implementing the baseline RV32IMC ISA. Using PULP-NN, a CIFAR-10 network on an octa-core cluster runs in 30x and 19.6x less clock cycles than the current state-of-the-art ARM CMSIS-NN library, running on STM32L4 and STM32H7 MCUs, respectively. The proposed library, when running on GAP-8 processor, outperforms by 36.8x and by 7.45x the execution on energy efficient MCUs such as STM32L4 and high-end MCUs such as STM32H7 respectively, when operating at the maximum frequency. The energy efficiency on GAP-8 is 14.1x higher than STM32L4 and 39.5x higher than STM32H7, at the maximum efficiency operating point.
Tasks Quantization
Published 2019-08-29
URL https://arxiv.org/abs/1908.11263v1
PDF https://arxiv.org/pdf/1908.11263v1.pdf
PWC https://paperswithcode.com/paper/pulp-nn-accelerating-quantized-neural
Repo https://github.com/pulp-platform/pulp-nn
Framework none

Fake News Detection as Natural Language Inference

Title Fake News Detection as Natural Language Inference
Authors Kai-Chou Yang, Timothy Niven, Hung-Yu Kao
Abstract This report describes the entry by the Intelligent Knowledge Management (IKM) Lab in the WSDM 2019 Fake News Classification challenge. We treat the task as natural language inference (NLI). We individually train a number of the strongest NLI models as well as BERT. We ensemble these results and retrain with noisy labels in two stages. We analyze transitivity relations in the train and test sets and determine a set of test cases that can be reliably classified on this basis. The remainder of test cases are classified by our ensemble. Our entry achieves test set accuracy of 88.063% for 3rd place in the competition.
Tasks Fake News Detection, Natural Language Inference
Published 2019-07-17
URL https://arxiv.org/abs/1907.07347v1
PDF https://arxiv.org/pdf/1907.07347v1.pdf
PWC https://paperswithcode.com/paper/fake-news-detection-as-natural-language
Repo https://github.com/zake7749/WSDM-Cup-2019
Framework pytorch

Learnable Parameter Similarity

Title Learnable Parameter Similarity
Authors Guangcong Wang, Jianhuang Lai, Wenqi Liang, Guangrun Wang
Abstract Most of the existing approaches focus on specific visual tasks while ignoring the relations between them. Estimating task relation sheds light on the learning of high-order semantic concepts, e.g., transfer learning. How to reveal the underlying relations between different visual tasks remains largely unexplored. In this paper, we propose a novel \textbf{L}earnable \textbf{P}arameter \textbf{S}imilarity (\textbf{LPS}) method that learns an effective metric to measure the similarity of second-order semantics hidden in trained models. LPS is achieved by using a second-order neural network to align high-dimensional model parameters and learning second-order similarity in an end-to-end way. In addition, we create a model set called ModelSet500 as a parameter similarity learning benchmark that contains 500 trained models. Extensive experiments on ModelSet500 validate the effectiveness of the proposed method. Code will be released at \url{https://github.com/Wanggcong/learnable-parameter-similarity}.
Tasks Transfer Learning
Published 2019-07-27
URL https://arxiv.org/abs/1907.11943v1
PDF https://arxiv.org/pdf/1907.11943v1.pdf
PWC https://paperswithcode.com/paper/learnable-parameter-similarity
Repo https://github.com/Wanggcong/learnable-parameter-similarity
Framework none

RealMix: Towards Realistic Semi-Supervised Deep Learning Algorithms

Title RealMix: Towards Realistic Semi-Supervised Deep Learning Algorithms
Authors Varun Nair, Javier Fuentes Alonso, Tony Beltramelli
Abstract Semi-Supervised Learning (SSL) algorithms have shown great potential in training regimes when access to labeled data is scarce but access to unlabeled data is plentiful. However, our experiments illustrate several shortcomings that prior SSL algorithms suffer from. In particular, poor performance when unlabeled and labeled data distributions differ. To address these observations, we develop RealMix, which achieves state-of-the-art results on standard benchmark datasets across different labeled and unlabeled set sizes while overcoming the aforementioned challenges. Notably, RealMix achieves an error rate of 9.79% on CIFAR10 with 250 labels and is the only SSL method tested able to surpass baseline performance when there is significant mismatch in the labeled and unlabeled data distributions. RealMix demonstrates how SSL can be used in real world situations with limited access to both data and compute and guides further research in SSL with practical applicability in mind.
Tasks Semi-Supervised Image Classification
Published 2019-12-18
URL https://arxiv.org/abs/1912.08766v1
PDF https://arxiv.org/pdf/1912.08766v1.pdf
PWC https://paperswithcode.com/paper/realmix-towards-realistic-semi-supervised
Repo https://github.com/uizard-technologies/realmix
Framework tf

PyKaldi2: Yet another speech toolkit based on Kaldi and PyTorch

Title PyKaldi2: Yet another speech toolkit based on Kaldi and PyTorch
Authors Liang Lu, Xiong Xiao, Zhuo Chen, Yifan Gong
Abstract We introduce PyKaldi2 speech recognition toolkit implemented based on Kaldi and PyTorch. While similar toolkits are available built on top of the two, a key feature of PyKaldi2 is sequence training with criteria such as MMI, sMBR and MPE. In particular, we implemented the sequence training module with on-the-fly lattice generation during model training in order to simplify the training pipeline. To address the challenging acoustic environments in real applications, PyKaldi2 also supports on-the-fly noise and reverberation simulation to improve the model robustness. With this feature, it is possible to backpropogate the gradients from the sequence-level loss to the front-end feature extraction module, which, hopefully, can foster more research in the direction of joint front-end and backend learning. We performed benchmark experiments on Librispeech, and show that PyKaldi2 can achieve reasonable recognition accuracy. The toolkit is released under the MIT license.
Tasks Speech Recognition
Published 2019-07-12
URL https://arxiv.org/abs/1907.05955v3
PDF https://arxiv.org/pdf/1907.05955v3.pdf
PWC https://paperswithcode.com/paper/pykaldi2-yet-another-speech-toolkit-based-on
Repo https://github.com/jzlianglu/pykaldi2
Framework pytorch

Evaluation of Sentence Representations in Polish

Title Evaluation of Sentence Representations in Polish
Authors Sławomir Dadas, Michał Perełkiewicz, Rafał Poświata
Abstract Methods for learning sentence representations have been actively developed in recent years. However, the lack of pre-trained models and datasets annotated at the sentence level has been a problem for low-resource languages such as Polish which led to less interest in applying these methods to language-specific tasks. In this study, we introduce two new Polish datasets for evaluating sentence embeddings and provide a comprehensive evaluation of eight sentence representation methods including Polish and multilingual models. We consider classic word embedding models, recently developed contextual embeddings and multilingual sentence encoders, showing strengths and weaknesses of specific approaches. We also examine different methods of aggregating word vectors into a single sentence vector.
Tasks Sentence Embeddings
Published 2019-10-25
URL https://arxiv.org/abs/1910.11834v2
PDF https://arxiv.org/pdf/1910.11834v2.pdf
PWC https://paperswithcode.com/paper/evaluation-of-sentence-representations-in
Repo https://github.com/sdadas/polish-sentence-evaluation
Framework pytorch

Exploring the Capacity of Sequential-free Box Discretization Network for Omnidirectional Scene Text Detection

Title Exploring the Capacity of Sequential-free Box Discretization Network for Omnidirectional Scene Text Detection
Authors Yuliang Liu, Tong He, Hao Chen, Xinyu Wang, Canjie Luo, Shuaitao Zhang, Chunhua Shen, Lianwen Jin
Abstract Omnidirectional scene text detection has received increasing research attention. Previous methods directly predict words or text lines of quadrilateral shapes. However, most methods neglect the significance of consistent labeling, which is important to maintain a stable training process, especially when a large amount of data are included. For the first time, we solve the problem in this paper by proposing a novel method termed Sequential-free Box Discretization (SBD). The proposed SBD first discretizes the quadrilateral box into several key edges, which contains all potential horizontal and vertical positions. In order to decode accurate vertex positions, a simple yet effective matching procedure is proposed to reconstruct the quadrilateral bounding boxes. It departs from the learning ambiguity which has a significant influence during the learning process. Exhaustive ablation studies have been conducted to quantitatively validate the effectiveness of our proposed method. More importantly, built upon SBD, we provide a detailed analysis of the impact of a collection of refinements, in the hope to inspire others to build state-of-the-art networks. Combining both SBD and these useful refinements, we achieve state-of-the-art performance on various benchmarks, including ICDAR 2015, and MLT. Our method also wins the first place in text detection task of the recent ICDAR2019 Robust Reading Challenge on Reading Chinese Text on Signboard, further demonstrating its powerful generalization ability. Code is available at https://tinyurl.com/sbdnet.
Tasks Scene Text Detection
Published 2019-12-20
URL https://arxiv.org/abs/1912.09629v1
PDF https://arxiv.org/pdf/1912.09629v1.pdf
PWC https://paperswithcode.com/paper/191209629
Repo https://github.com/Yuliang-Liu/Box_Discretization_Network
Framework pytorch
comments powered by Disqus