July 30, 2019

2827 words 14 mins read

Paper Group AWR 15

Paper Group AWR 15

Analysis and Optimization of Convolutional Neural Network Architectures. Batch-based Activity Recognition from Egocentric Photo-Streams Revisited. Temporally-Reweighted Chinese Restaurant Process Mixtures for Clustering, Imputing, and Forecasting Multivariate Time Series. Generating Multi-label Discrete Patient Records using Generative Adversarial …

Analysis and Optimization of Convolutional Neural Network Architectures

Title Analysis and Optimization of Convolutional Neural Network Architectures
Authors Martin Thoma
Abstract Convolutional Neural Networks (CNNs) dominate various computer vision tasks since Alex Krizhevsky showed that they can be trained effectively and reduced the top-5 error from 26.2 % to 15.3 % on the ImageNet large scale visual recognition challenge. Many aspects of CNNs are examined in various publications, but literature about the analysis and construction of neural network architectures is rare. This work is one step to close this gap. A comprehensive overview over existing techniques for CNN analysis and topology construction is provided. A novel way to visualize classification errors with confusion matrices was developed. Based on this method, hierarchical classifiers are described and evaluated. Additionally, some results are confirmed and quantified for CIFAR-100. For example, the positive impact of smaller batch sizes, averaging ensembles, data augmentation and test-time transformations on the accuracy. Other results, such as the positive impact of learned color transformation on the test accuracy could not be confirmed. A model which has only one million learned parameters for an input size of 32x32x3 and 100 classes and which beats the state of the art on the benchmark dataset Asirra, GTSRB, HASYv2 and STL-10 was developed.
Tasks Data Augmentation, Object Recognition
Published 2017-07-31
URL http://arxiv.org/abs/1707.09725v1
PDF http://arxiv.org/pdf/1707.09725v1.pdf
PWC https://paperswithcode.com/paper/analysis-and-optimization-of-convolutional
Repo https://github.com/MartinThoma/clana
Framework none

Batch-based Activity Recognition from Egocentric Photo-Streams Revisited

Title Batch-based Activity Recognition from Egocentric Photo-Streams Revisited
Authors Alejandro Cartas, Juan Marin, Petia Radeva, Mariella Dimiccoli
Abstract Wearable cameras can gather large a-mounts of image data that provide rich visual information about the daily activities of the wearer. Motivated by the large number of health applications that could be enabled by the automatic recognition of daily activities, such as lifestyle characterization for habit improvement, context-aware personal assistance and tele-rehabilitation services, we propose a system to classify 21 daily activities from photo-streams acquired by a wearable photo-camera. Our approach combines the advantages of a Late Fusion Ensemble strategy relying on convolutional neural networks at image level with the ability of recurrent neural networks to account for the temporal evolution of high level features in photo-streams without relying on event boundaries. The proposed batch-based approach achieved an overall accuracy of 89.85%, outperforming state of the art end-to-end methodologies. These results were achieved on a dataset consists of 44,902 egocentric pictures from three persons captured during 26 days in average.
Tasks Activity Recognition
Published 2017-10-11
URL http://arxiv.org/abs/1710.04112v2
PDF http://arxiv.org/pdf/1710.04112v2.pdf
PWC https://paperswithcode.com/paper/batch-based-activity-recognition-from-1
Repo https://github.com/gorayni/egocentric_photostreams
Framework none

Temporally-Reweighted Chinese Restaurant Process Mixtures for Clustering, Imputing, and Forecasting Multivariate Time Series

Title Temporally-Reweighted Chinese Restaurant Process Mixtures for Clustering, Imputing, and Forecasting Multivariate Time Series
Authors Feras A. Saad, Vikash K. Mansinghka
Abstract This article proposes a Bayesian nonparametric method for forecasting, imputation, and clustering in sparsely observed, multivariate time series data. The method is appropriate for jointly modeling hundreds of time series with widely varying, non-stationary dynamics. Given a collection of $N$ time series, the Bayesian model first partitions them into independent clusters using a Chinese restaurant process prior. Within a cluster, all time series are modeled jointly using a novel “temporally-reweighted” extension of the Chinese restaurant process mixture. Markov chain Monte Carlo techniques are used to obtain samples from the posterior distribution, which are then used to form predictive inferences. We apply the technique to challenging forecasting and imputation tasks using seasonal flu data from the US Center for Disease Control and Prevention, demonstrating superior forecasting accuracy and competitive imputation accuracy as compared to multiple widely used baselines. We further show that the model discovers interpretable clusters in datasets with hundreds of time series, using macroeconomic data from the Gapminder Foundation.
Tasks Imputation, Time Series
Published 2017-10-18
URL http://arxiv.org/abs/1710.06900v2
PDF http://arxiv.org/pdf/1710.06900v2.pdf
PWC https://paperswithcode.com/paper/temporally-reweighted-chinese-restaurant
Repo https://github.com/probcomp/trcrpm
Framework none

Generating Multi-label Discrete Patient Records using Generative Adversarial Networks

Title Generating Multi-label Discrete Patient Records using Generative Adversarial Networks
Authors Edward Choi, Siddharth Biswal, Bradley Malin, Jon Duke, Walter F. Stewart, Jimeng Sun
Abstract Access to electronic health record (EHR) data has motivated computational advances in medical research. However, various concerns, particularly over privacy, can limit access to and collaborative use of EHR data. Sharing synthetic EHR data could mitigate risk. In this paper, we propose a new approach, medical Generative Adversarial Network (medGAN), to generate realistic synthetic patient records. Based on input real patient records, medGAN can generate high-dimensional discrete variables (e.g., binary and count features) via a combination of an autoencoder and generative adversarial networks. We also propose minibatch averaging to efficiently avoid mode collapse, and increase the learning efficiency with batch normalization and shortcut connections. To demonstrate feasibility, we showed that medGAN generates synthetic patient records that achieve comparable performance to real data on many experiments including distribution statistics, predictive modeling tasks and a medical expert review. We also empirically observe a limited privacy risk in both identity and attribute disclosure using medGAN.
Tasks
Published 2017-03-19
URL http://arxiv.org/abs/1703.06490v3
PDF http://arxiv.org/pdf/1703.06490v3.pdf
PWC https://paperswithcode.com/paper/generating-multi-label-discrete-patient
Repo https://github.com/astorfi/cor-gan
Framework pytorch

Adaptive Cardinality Estimation

Title Adaptive Cardinality Estimation
Authors Oleg Ivanov, Sergey Bartunov
Abstract In this paper we address cardinality estimation problem which is an important subproblem in query optimization. Query optimization is a part of every relational DBMS responsible for finding the best way of the execution for the given query. These ways are called plans. The execution time of different plans may differ by several orders, so query optimizer has a great influence on the whole DBMS performance. We consider cost-based query optimization approach as the most popular one. It was observed that cost-based optimization quality depends much on cardinality estimation quality. Cardinality of the plan node is the number of tuples returned by it. In the paper we propose a novel cardinality estimation approach with the use of machine learning methods. The main point of the approach is using query execution statistics of the previously executed queries to improve cardinality estimations. We called this approach adaptive cardinality estimation to reflect this point. The approach is general, flexible, and easy to implement. The experimental evaluation shows that this approach significantly increases the quality of cardinality estimation, and therefore increases the DBMS performance for some queries by several times or even by several dozens of times.
Tasks
Published 2017-11-22
URL http://arxiv.org/abs/1711.08330v1
PDF http://arxiv.org/pdf/1711.08330v1.pdf
PWC https://paperswithcode.com/paper/adaptive-cardinality-estimation
Repo https://github.com/tigvarts/aqo
Framework none

Comparative Study of CNN and RNN for Natural Language Processing

Title Comparative Study of CNN and RNN for Natural Language Processing
Authors Wenpeng Yin, Katharina Kann, Mo Yu, Hinrich Schütze
Abstract Deep neural networks (DNN) have revolutionized the field of natural language processing (NLP). Convolutional neural network (CNN) and recurrent neural network (RNN), the two main types of DNN architectures, are widely explored to handle various NLP tasks. CNN is supposed to be good at extracting position-invariant features and RNN at modeling units in sequence. The state of the art on many NLP tasks often switches due to the battle between CNNs and RNNs. This work is the first systematic comparison of CNN and RNN on a wide range of representative NLP tasks, aiming to give basic guidance for DNN selection.
Tasks
Published 2017-02-07
URL http://arxiv.org/abs/1702.01923v1
PDF http://arxiv.org/pdf/1702.01923v1.pdf
PWC https://paperswithcode.com/paper/comparative-study-of-cnn-and-rnn-for-natural
Repo https://github.com/Msundarv/TwiLoc
Framework none

Semantic Structure and Interpretability of Word Embeddings

Title Semantic Structure and Interpretability of Word Embeddings
Authors Lutfi Kerem Senel, Ihsan Utlu, Veysel Yucesoy, Aykut Koc, Tolga Cukur
Abstract Dense word embeddings, which encode semantic meanings of words to low dimensional vector spaces have become very popular in natural language processing (NLP) research due to their state-of-the-art performances in many NLP tasks. Word embeddings are substantially successful in capturing semantic relations among words, so a meaningful semantic structure must be present in the respective vector spaces. However, in many cases, this semantic structure is broadly and heterogeneously distributed across the embedding dimensions, which makes interpretation a big challenge. In this study, we propose a statistical method to uncover the latent semantic structure in the dense word embeddings. To perform our analysis we introduce a new dataset (SEMCAT) that contains more than 6500 words semantically grouped under 110 categories. We further propose a method to quantify the interpretability of the word embeddings; the proposed method is a practical alternative to the classical word intrusion test that requires human intervention.
Tasks Word Embeddings
Published 2017-11-01
URL http://arxiv.org/abs/1711.00331v3
PDF http://arxiv.org/pdf/1711.00331v3.pdf
PWC https://paperswithcode.com/paper/semantic-structure-and-interpretability-of
Repo https://github.com/avaapm/SEMCATdataset2018
Framework none

On human motion prediction using recurrent neural networks

Title On human motion prediction using recurrent neural networks
Authors Julieta Martinez, Michael J. Black, Javier Romero
Abstract Human motion modelling is a classical problem at the intersection of graphics and computer vision, with applications spanning human-computer interaction, motion synthesis, and motion prediction for virtual and augmented reality. Following the success of deep learning methods in several computer vision tasks, recent work has focused on using deep recurrent neural networks (RNNs) to model human motion, with the goal of learning time-dependent representations that perform tasks such as short-term motion prediction and long-term human motion synthesis. We examine recent work, with a focus on the evaluation methodologies commonly used in the literature, and show that, surprisingly, state-of-the-art performance can be achieved by a simple baseline that does not attempt to model motion at all. We investigate this result, and analyze recent RNN methods by looking at the architectures, loss functions, and training procedures used in state-of-the-art approaches. We propose three changes to the standard RNN models typically used for human motion, which result in a simple and scalable RNN architecture that obtains state-of-the-art performance on human motion prediction.
Tasks Motion Estimation, motion prediction
Published 2017-05-06
URL http://arxiv.org/abs/1705.02445v1
PDF http://arxiv.org/pdf/1705.02445v1.pdf
PWC https://paperswithcode.com/paper/on-human-motion-prediction-using-recurrent
Repo https://github.com/una-dinosauria/human-motion-prediction
Framework tf

AlignedReID: Surpassing Human-Level Performance in Person Re-Identification

Title AlignedReID: Surpassing Human-Level Performance in Person Re-Identification
Authors Xuan Zhang, Hao Luo, Xing Fan, Weilai Xiang, Yixiao Sun, Qiqi Xiao, Wei Jiang, Chi Zhang, Jian Sun
Abstract In this paper, we propose a novel method called AlignedReID that extracts a global feature which is jointly learned with local features. Global feature learning benefits greatly from local feature learning, which performs an alignment/matching by calculating the shortest path between two sets of local features, without requiring extra supervision. After the joint learning, we only keep the global feature to compute the similarities between images. Our method achieves rank-1 accuracy of 94.4% on Market1501 and 97.8% on CUHK03, outperforming state-of-the-art methods by a large margin. We also evaluate human-level performance and demonstrate that our method is the first to surpass human-level performance on Market1501 and CUHK03, two widely used Person ReID datasets.
Tasks Person Re-Identification
Published 2017-11-22
URL http://arxiv.org/abs/1711.08184v2
PDF http://arxiv.org/pdf/1711.08184v2.pdf
PWC https://paperswithcode.com/paper/alignedreid-surpassing-human-level
Repo https://github.com/Proxim123/aligned-reID-No-2-
Framework pytorch

On the Diversity of Realistic Image Synthesis

Title On the Diversity of Realistic Image Synthesis
Authors Zichen Yang, Haifeng Liu, Deng Cai
Abstract Many image processing tasks can be formulated as translating images between two image domains, such as colorization, super resolution and conditional image synthesis. In most of these tasks, an input image may correspond to multiple outputs. However, current existing approaches only show very minor diversity of the outputs. In this paper, we present a novel approach to synthesize diverse realistic images corresponding to a semantic layout. We introduce a diversity loss objective, which maximizes the distance between synthesized image pairs and links the input noise to the semantic segments in the synthesized images. Thus, our approach can not only produce diverse images, but also allow users to manipulate the output images by adjusting the noise manually. Experimental results show that images synthesized by our approach are significantly more diverse than that of the current existing works and equipping our diversity loss does not degrade the reality of the base networks.
Tasks Colorization, Image Generation, Super-Resolution
Published 2017-12-20
URL http://arxiv.org/abs/1712.07329v1
PDF http://arxiv.org/pdf/1712.07329v1.pdf
PWC https://paperswithcode.com/paper/on-the-diversity-of-realistic-image-synthesis
Repo https://github.com/ZJULearning/diverse_image_synthesis
Framework pytorch

Residual Attention Network for Image Classification

Title Residual Attention Network for Image Classification
Authors Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, Xiaoou Tang
Abstract In this work, we propose “Residual Attention Network”, a convolutional neural network using attention mechanism which can incorporate with state-of-art feed forward network architecture in an end-to-end training fashion. Our Residual Attention Network is built by stacking Attention Modules which generate attention-aware features. The attention-aware features from different modules change adaptively as layers going deeper. Inside each Attention Module, bottom-up top-down feedforward structure is used to unfold the feedforward and feedback attention process into a single feedforward process. Importantly, we propose attention residual learning to train very deep Residual Attention Networks which can be easily scaled up to hundreds of layers. Extensive analyses are conducted on CIFAR-10 and CIFAR-100 datasets to verify the effectiveness of every module mentioned above. Our Residual Attention Network achieves state-of-the-art object recognition performance on three benchmark datasets including CIFAR-10 (3.90% error), CIFAR-100 (20.45% error) and ImageNet (4.8% single model and single crop, top-5 error). Note that, our method achieves 0.6% top-1 accuracy improvement with 46% trunk depth and 69% forward FLOPs comparing to ResNet-200. The experiment also demonstrates that our network is robust against noisy labels.
Tasks Image Classification, Object Recognition
Published 2017-04-23
URL http://arxiv.org/abs/1704.06904v1
PDF http://arxiv.org/pdf/1704.06904v1.pdf
PWC https://paperswithcode.com/paper/residual-attention-network-for-image
Repo https://github.com/PistonY/ResidualAttentionNetwork
Framework tf

Learning Important Features Through Propagating Activation Differences

Title Learning Important Features Through Propagating Activation Differences
Authors Avanti Shrikumar, Peyton Greenside, Anshul Kundaje
Abstract The purported “black box” nature of neural networks is a barrier to adoption in applications where interpretability is essential. Here we present DeepLIFT (Deep Learning Important FeaTures), a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input. DeepLIFT compares the activation of each neuron to its ‘reference activation’ and assigns contribution scores according to the difference. By optionally giving separate consideration to positive and negative contributions, DeepLIFT can also reveal dependencies which are missed by other approaches. Scores can be computed efficiently in a single backward pass. We apply DeepLIFT to models trained on MNIST and simulated genomic data, and show significant advantages over gradient-based methods. Video tutorial: http://goo.gl/qKb7pL, ICML slides: bit.ly/deeplifticmlslides, ICML talk: https://vimeo.com/238275076, code: http://goo.gl/RM8jvH.
Tasks Interpretable Machine Learning
Published 2017-04-10
URL https://arxiv.org/abs/1704.02685v2
PDF https://arxiv.org/pdf/1704.02685v2.pdf
PWC https://paperswithcode.com/paper/learning-important-features-through
Repo https://github.com/saivarunr/xshap
Framework tf

Online Learning Sensing Matrix and Sparsifying Dictionary Simultaneously for Compressive Sensing

Title Online Learning Sensing Matrix and Sparsifying Dictionary Simultaneously for Compressive Sensing
Authors Tao Hong, Zhihui Zhu
Abstract This paper considers the problem of simultaneously learning the Sensing Matrix and Sparsifying Dictionary (SMSD) on a large training dataset. To address the formulated joint learning problem, we propose an online algorithm that consists of a closed-form solution for optimizing the sensing matrix with a fixed sparsifying dictionary and a stochastic method for learning the sparsifying dictionary on a large dataset when the sensing matrix is given. Benefiting from training on a large dataset, the obtained compressive sensing (CS) system by the proposed algorithm yields a much better performance in terms of signal recovery accuracy than the existing ones. The simulation results on natural images demonstrate the effectiveness of the suggested online algorithm compared with the existing methods.
Tasks Compressive Sensing
Published 2017-01-04
URL http://arxiv.org/abs/1701.01000v4
PDF http://arxiv.org/pdf/1701.01000v4.pdf
PWC https://paperswithcode.com/paper/online-learning-sensing-matrix-and
Repo https://github.com/happyhongt/Online-Learning-SMSD-Large-Dataset
Framework none

Look, Listen and Learn

Title Look, Listen and Learn
Authors Relja Arandjelović, Andrew Zisserman
Abstract We consider the question: what can be learnt by looking at and listening to a large number of unlabelled videos? There is a valuable, but so far untapped, source of information contained in the video itself – the correspondence between the visual and the audio streams, and we introduce a novel “Audio-Visual Correspondence” learning task that makes use of this. Training visual and audio networks from scratch, without any additional supervision other than the raw unconstrained videos themselves, is shown to successfully solve this task, and, more interestingly, result in good visual and audio representations. These features set the new state-of-the-art on two sound classification benchmarks, and perform on par with the state-of-the-art self-supervised approaches on ImageNet classification. We also demonstrate that the network is able to localize objects in both modalities, as well as perform fine-grained recognition tasks.
Tasks
Published 2017-05-23
URL http://arxiv.org/abs/1705.08168v2
PDF http://arxiv.org/pdf/1705.08168v2.pdf
PWC https://paperswithcode.com/paper/look-listen-and-learn
Repo https://github.com/marl/l3embedding
Framework tf

Generating Memorable Mnemonic Encodings of Numbers

Title Generating Memorable Mnemonic Encodings of Numbers
Authors Vincent Fiorentini, Megan Shao, Julie Medero
Abstract The major system is a mnemonic system that can be used to memorize sequences of numbers. In this work, we present a method to automatically generate sentences that encode a given number. We propose several encoding models and compare the most promising ones in a password memorability study. The results of the study show that a model combining part-of-speech sentence templates with an $n$-gram language model produces the most memorable password representations.
Tasks Language Modelling
Published 2017-05-07
URL http://arxiv.org/abs/1705.02700v1
PDF http://arxiv.org/pdf/1705.02700v1.pdf
PWC https://paperswithcode.com/paper/generating-memorable-mnemonic-encodings-of
Repo https://github.com/VinceFior/major-system
Framework none
comments powered by Disqus