July 30, 2019

2827 words 14 mins read

Paper Group AWR 15

Analysis and Optimization of Convolutional Neural Network Architectures. Batch-based Activity Recognition from Egocentric Photo-Streams Revisited. Temporally-Reweighted Chinese Restaurant Process Mixtures for Clustering, Imputing, and Forecasting Multivariate Time Series. Generating Multi-label Discrete Patient Records using Generative Adversarial …

Analysis and Optimization of Convolutional Neural Network Architectures


Title	Analysis and Optimization of Convolutional Neural Network Architectures
Authors	Martin Thoma
Abstract	Convolutional Neural Networks (CNNs) dominate various computer vision tasks since Alex Krizhevsky showed that they can be trained effectively and reduced the top-5 error from 26.2 % to 15.3 % on the ImageNet large scale visual recognition challenge. Many aspects of CNNs are examined in various publications, but literature about the analysis and construction of neural network architectures is rare. This work is one step to close this gap. A comprehensive overview over existing techniques for CNN analysis and topology construction is provided. A novel way to visualize classification errors with confusion matrices was developed. Based on this method, hierarchical classifiers are described and evaluated. Additionally, some results are confirmed and quantified for CIFAR-100. For example, the positive impact of smaller batch sizes, averaging ensembles, data augmentation and test-time transformations on the accuracy. Other results, such as the positive impact of learned color transformation on the test accuracy could not be confirmed. A model which has only one million learned parameters for an input size of 32x32x3 and 100 classes and which beats the state of the art on the benchmark dataset Asirra, GTSRB, HASYv2 and STL-10 was developed.
Tasks	Data Augmentation, Object Recognition
Published	2017-07-31
URL	http://arxiv.org/abs/1707.09725v1
PDF	http://arxiv.org/pdf/1707.09725v1.pdf
PWC	https://paperswithcode.com/paper/analysis-and-optimization-of-convolutional
Repo	https://github.com/MartinThoma/clana
Framework	none

Batch-based Activity Recognition from Egocentric Photo-Streams Revisited


Title	Batch-based Activity Recognition from Egocentric Photo-Streams Revisited
Authors	Alejandro Cartas, Juan Marin, Petia Radeva, Mariella Dimiccoli
Abstract	Wearable cameras can gather large a-mounts of image data that provide rich visual information about the daily activities of the wearer. Motivated by the large number of health applications that could be enabled by the automatic recognition of daily activities, such as lifestyle characterization for habit improvement, context-aware personal assistance and tele-rehabilitation services, we propose a system to classify 21 daily activities from photo-streams acquired by a wearable photo-camera. Our approach combines the advantages of a Late Fusion Ensemble strategy relying on convolutional neural networks at image level with the ability of recurrent neural networks to account for the temporal evolution of high level features in photo-streams without relying on event boundaries. The proposed batch-based approach achieved an overall accuracy of 89.85%, outperforming state of the art end-to-end methodologies. These results were achieved on a dataset consists of 44,902 egocentric pictures from three persons captured during 26 days in average.
Tasks	Activity Recognition
Published	2017-10-11
URL	http://arxiv.org/abs/1710.04112v2
PDF	http://arxiv.org/pdf/1710.04112v2.pdf
PWC	https://paperswithcode.com/paper/batch-based-activity-recognition-from-1
Repo	https://github.com/gorayni/egocentric_photostreams
Framework	none

Temporally-Reweighted Chinese Restaurant Process Mixtures for Clustering, Imputing, and Forecasting Multivariate Time Series


Title	Temporally-Reweighted Chinese Restaurant Process Mixtures for Clustering, Imputing, and Forecasting Multivariate Time Series
Authors	Feras A. Saad, Vikash K. Mansinghka
Abstract	This article proposes a Bayesian nonparametric method for forecasting, imputation, and clustering in sparsely observed, multivariate time series data. The method is appropriate for jointly modeling hundreds of time series with widely varying, non-stationary dynamics. Given a collection of $N$ time series, the Bayesian model first partitions them into independent clusters using a Chinese restaurant process prior. Within a cluster, all time series are modeled jointly using a novel “temporally-reweighted” extension of the Chinese restaurant process mixture. Markov chain Monte Carlo techniques are used to obtain samples from the posterior distribution, which are then used to form predictive inferences. We apply the technique to challenging forecasting and imputation tasks using seasonal flu data from the US Center for Disease Control and Prevention, demonstrating superior forecasting accuracy and competitive imputation accuracy as compared to multiple widely used baselines. We further show that the model discovers interpretable clusters in datasets with hundreds of time series, using macroeconomic data from the Gapminder Foundation.
Tasks	Imputation, Time Series
Published	2017-10-18
URL	http://arxiv.org/abs/1710.06900v2
PDF	http://arxiv.org/pdf/1710.06900v2.pdf
PWC	https://paperswithcode.com/paper/temporally-reweighted-chinese-restaurant
Repo	https://github.com/probcomp/trcrpm
Framework	none

Generating Multi-label Discrete Patient Records using Generative Adversarial Networks


Title	Generating Multi-label Discrete Patient Records using Generative Adversarial Networks
Authors	Edward Choi, Siddharth Biswal, Bradley Malin, Jon Duke, Walter F. Stewart, Jimeng Sun
Abstract	Access to electronic health record (EHR) data has motivated computational advances in medical research. However, various concerns, particularly over privacy, can limit access to and collaborative use of EHR data. Sharing synthetic EHR data could mitigate risk. In this paper, we propose a new approach, medical Generative Adversarial Network (medGAN), to generate realistic synthetic patient records. Based on input real patient records, medGAN can generate high-dimensional discrete variables (e.g., binary and count features) via a combination of an autoencoder and generative adversarial networks. We also propose minibatch averaging to efficiently avoid mode collapse, and increase the learning efficiency with batch normalization and shortcut connections. To demonstrate feasibility, we showed that medGAN generates synthetic patient records that achieve comparable performance to real data on many experiments including distribution statistics, predictive modeling tasks and a medical expert review. We also empirically observe a limited privacy risk in both identity and attribute disclosure using medGAN.
Tasks
Published	2017-03-19
URL	http://arxiv.org/abs/1703.06490v3
PDF	http://arxiv.org/pdf/1703.06490v3.pdf
PWC	https://paperswithcode.com/paper/generating-multi-label-discrete-patient
Repo	https://github.com/astorfi/cor-gan
Framework	pytorch

Adaptive Cardinality Estimation


Title	Adaptive Cardinality Estimation
Authors	Oleg Ivanov, Sergey Bartunov
Abstract	In this paper we address cardinality estimation problem which is an important subproblem in query optimization. Query optimization is a part of every relational DBMS responsible for finding the best way of the execution for the given query. These ways are called plans. The execution time of different plans may differ by several orders, so query optimizer has a great influence on the whole DBMS performance. We consider cost-based query optimization approach as the most popular one. It was observed that cost-based optimization quality depends much on cardinality estimation quality. Cardinality of the plan node is the number of tuples returned by it. In the paper we propose a novel cardinality estimation approach with the use of machine learning methods. The main point of the approach is using query execution statistics of the previously executed queries to improve cardinality estimations. We called this approach adaptive cardinality estimation to reflect this point. The approach is general, flexible, and easy to implement. The experimental evaluation shows that this approach significantly increases the quality of cardinality estimation, and therefore increases the DBMS performance for some queries by several times or even by several dozens of times.
Tasks
Published	2017-11-22
URL	http://arxiv.org/abs/1711.08330v1
PDF	http://arxiv.org/pdf/1711.08330v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-cardinality-estimation
Repo	https://github.com/tigvarts/aqo
Framework	none

Comparative Study of CNN and RNN for Natural Language Processing


Title	Comparative Study of CNN and RNN for Natural Language Processing
Authors	Wenpeng Yin, Katharina Kann, Mo Yu, Hinrich Schütze
Abstract	Deep neural networks (DNN) have revolutionized the field of natural language processing (NLP). Convolutional neural network (CNN) and recurrent neural network (RNN), the two main types of DNN architectures, are widely explored to handle various NLP tasks. CNN is supposed to be good at extracting position-invariant features and RNN at modeling units in sequence. The state of the art on many NLP tasks often switches due to the battle between CNNs and RNNs. This work is the first systematic comparison of CNN and RNN on a wide range of representative NLP tasks, aiming to give basic guidance for DNN selection.
Tasks
Published	2017-02-07
URL	http://arxiv.org/abs/1702.01923v1
PDF	http://arxiv.org/pdf/1702.01923v1.pdf
PWC	https://paperswithcode.com/paper/comparative-study-of-cnn-and-rnn-for-natural
Repo	https://github.com/Msundarv/TwiLoc
Framework	none

Semantic Structure and Interpretability of Word Embeddings


Title	Semantic Structure and Interpretability of Word Embeddings
Authors	Lutfi Kerem Senel, Ihsan Utlu, Veysel Yucesoy, Aykut Koc, Tolga Cukur
Abstract	Dense word embeddings, which encode semantic meanings of words to low dimensional vector spaces have become very popular in natural language processing (NLP) research due to their state-of-the-art performances in many NLP tasks. Word embeddings are substantially successful in capturing semantic relations among words, so a meaningful semantic structure must be present in the respective vector spaces. However, in many cases, this semantic structure is broadly and heterogeneously distributed across the embedding dimensions, which makes interpretation a big challenge. In this study, we propose a statistical method to uncover the latent semantic structure in the dense word embeddings. To perform our analysis we introduce a new dataset (SEMCAT) that contains more than 6500 words semantically grouped under 110 categories. We further propose a method to quantify the interpretability of the word embeddings; the proposed method is a practical alternative to the classical word intrusion test that requires human intervention.
Tasks	Word Embeddings
Published	2017-11-01
URL	http://arxiv.org/abs/1711.00331v3
PDF	http://arxiv.org/pdf/1711.00331v3.pdf
PWC	https://paperswithcode.com/paper/semantic-structure-and-interpretability-of
Repo	https://github.com/avaapm/SEMCATdataset2018
Framework	none

On human motion prediction using recurrent neural networks


Title	On human motion prediction using recurrent neural networks
Authors	Julieta Martinez, Michael J. Black, Javier Romero
Abstract	Human motion modelling is a classical problem at the intersection of graphics and computer vision, with applications spanning human-computer interaction, motion synthesis, and motion prediction for virtual and augmented reality. Following the success of deep learning methods in several computer vision tasks, recent work has focused on using deep recurrent neural networks (RNNs) to model human motion, with the goal of learning time-dependent representations that perform tasks such as short-term motion prediction and long-term human motion synthesis. We examine recent work, with a focus on the evaluation methodologies commonly used in the literature, and show that, surprisingly, state-of-the-art performance can be achieved by a simple baseline that does not attempt to model motion at all. We investigate this result, and analyze recent RNN methods by looking at the architectures, loss functions, and training procedures used in state-of-the-art approaches. We propose three changes to the standard RNN models typically used for human motion, which result in a simple and scalable RNN architecture that obtains state-of-the-art performance on human motion prediction.
Tasks	Motion Estimation, motion prediction
Published	2017-05-06
URL	http://arxiv.org/abs/1705.02445v1
PDF	http://arxiv.org/pdf/1705.02445v1.pdf
PWC	https://paperswithcode.com/paper/on-human-motion-prediction-using-recurrent
Repo	https://github.com/una-dinosauria/human-motion-prediction
Framework	tf

AlignedReID: Surpassing Human-Level Performance in Person Re-Identification


Title	AlignedReID: Surpassing Human-Level Performance in Person Re-Identification
Authors	Xuan Zhang, Hao Luo, Xing Fan, Weilai Xiang, Yixiao Sun, Qiqi Xiao, Wei Jiang, Chi Zhang, Jian Sun
Abstract	In this paper, we propose a novel method called AlignedReID that extracts a global feature which is jointly learned with local features. Global feature learning benefits greatly from local feature learning, which performs an alignment/matching by calculating the shortest path between two sets of local features, without requiring extra supervision. After the joint learning, we only keep the global feature to compute the similarities between images. Our method achieves rank-1 accuracy of 94.4% on Market1501 and 97.8% on CUHK03, outperforming state-of-the-art methods by a large margin. We also evaluate human-level performance and demonstrate that our method is the first to surpass human-level performance on Market1501 and CUHK03, two widely used Person ReID datasets.
Tasks	Person Re-Identification
Published	2017-11-22
URL	http://arxiv.org/abs/1711.08184v2
PDF	http://arxiv.org/pdf/1711.08184v2.pdf
PWC	https://paperswithcode.com/paper/alignedreid-surpassing-human-level
Repo	https://github.com/Proxim123/aligned-reID-No-2-
Framework	pytorch

On the Diversity of Realistic Image Synthesis


Title	On the Diversity of Realistic Image Synthesis
Authors	Zichen Yang, Haifeng Liu, Deng Cai
Abstract	Many image processing tasks can be formulated as translating images between two image domains, such as colorization, super resolution and conditional image synthesis. In most of these tasks, an input image may correspond to multiple outputs. However, current existing approaches only show very minor diversity of the outputs. In this paper, we present a novel approach to synthesize diverse realistic images corresponding to a semantic layout. We introduce a diversity loss objective, which maximizes the distance between synthesized image pairs and links the input noise to the semantic segments in the synthesized images. Thus, our approach can not only produce diverse images, but also allow users to manipulate the output images by adjusting the noise manually. Experimental results show that images synthesized by our approach are significantly more diverse than that of the current existing works and equipping our diversity loss does not degrade the reality of the base networks.
Tasks	Colorization, Image Generation, Super-Resolution
Published	2017-12-20
URL	http://arxiv.org/abs/1712.07329v1
PDF	http://arxiv.org/pdf/1712.07329v1.pdf
PWC	https://paperswithcode.com/paper/on-the-diversity-of-realistic-image-synthesis
Repo	https://github.com/ZJULearning/diverse_image_synthesis
Framework	pytorch

Residual Attention Network for Image Classification


Title	Residual Attention Network for Image Classification
Authors	Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, Xiaoou Tang
Abstract	In this work, we propose “Residual Attention Network”, a convolutional neural network using attention mechanism which can incorporate with state-of-art feed forward network architecture in an end-to-end training fashion. Our Residual Attention Network is built by stacking Attention Modules which generate attention-aware features. The attention-aware features from different modules change adaptively as layers going deeper. Inside each Attention Module, bottom-up top-down feedforward structure is used to unfold the feedforward and feedback attention process into a single feedforward process. Importantly, we propose attention residual learning to train very deep Residual Attention Networks which can be easily scaled up to hundreds of layers. Extensive analyses are conducted on CIFAR-10 and CIFAR-100 datasets to verify the effectiveness of every module mentioned above. Our Residual Attention Network achieves state-of-the-art object recognition performance on three benchmark datasets including CIFAR-10 (3.90% error), CIFAR-100 (20.45% error) and ImageNet (4.8% single model and single crop, top-5 error). Note that, our method achieves 0.6% top-1 accuracy improvement with 46% trunk depth and 69% forward FLOPs comparing to ResNet-200. The experiment also demonstrates that our network is robust against noisy labels.
Tasks	Image Classification, Object Recognition
Published	2017-04-23
URL	http://arxiv.org/abs/1704.06904v1
PDF	http://arxiv.org/pdf/1704.06904v1.pdf
PWC	https://paperswithcode.com/paper/residual-attention-network-for-image
Repo	https://github.com/PistonY/ResidualAttentionNetwork
Framework	tf

Learning Important Features Through Propagating Activation Differences


Title	Learning Important Features Through Propagating Activation Differences
Authors	Avanti Shrikumar, Peyton Greenside, Anshul Kundaje
Abstract	The purported “black box” nature of neural networks is a barrier to adoption in applications where interpretability is essential. Here we present DeepLIFT (Deep Learning Important FeaTures), a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input. DeepLIFT compares the activation of each neuron to its ‘reference activation’ and assigns contribution scores according to the difference. By optionally giving separate consideration to positive and negative contributions, DeepLIFT can also reveal dependencies which are missed by other approaches. Scores can be computed efficiently in a single backward pass. We apply DeepLIFT to models trained on MNIST and simulated genomic data, and show significant advantages over gradient-based methods. Video tutorial: http://goo.gl/qKb7pL, ICML slides: bit.ly/deeplifticmlslides, ICML talk: https://vimeo.com/238275076, code: http://goo.gl/RM8jvH.
Tasks	Interpretable Machine Learning
Published	2017-04-10
URL	https://arxiv.org/abs/1704.02685v2
PDF	https://arxiv.org/pdf/1704.02685v2.pdf
PWC	https://paperswithcode.com/paper/learning-important-features-through
Repo	https://github.com/saivarunr/xshap
Framework	tf

Online Learning Sensing Matrix and Sparsifying Dictionary Simultaneously for Compressive Sensing


Title	Online Learning Sensing Matrix and Sparsifying Dictionary Simultaneously for Compressive Sensing
Authors	Tao Hong, Zhihui Zhu
Abstract	This paper considers the problem of simultaneously learning the Sensing Matrix and Sparsifying Dictionary (SMSD) on a large training dataset. To address the formulated joint learning problem, we propose an online algorithm that consists of a closed-form solution for optimizing the sensing matrix with a fixed sparsifying dictionary and a stochastic method for learning the sparsifying dictionary on a large dataset when the sensing matrix is given. Benefiting from training on a large dataset, the obtained compressive sensing (CS) system by the proposed algorithm yields a much better performance in terms of signal recovery accuracy than the existing ones. The simulation results on natural images demonstrate the effectiveness of the suggested online algorithm compared with the existing methods.
Tasks	Compressive Sensing
Published	2017-01-04
URL	http://arxiv.org/abs/1701.01000v4
PDF	http://arxiv.org/pdf/1701.01000v4.pdf
PWC	https://paperswithcode.com/paper/online-learning-sensing-matrix-and
Repo	https://github.com/happyhongt/Online-Learning-SMSD-Large-Dataset
Framework	none

Look, Listen and Learn


Title	Look, Listen and Learn
Authors	Relja Arandjelović, Andrew Zisserman
Abstract	We consider the question: what can be learnt by looking at and listening to a large number of unlabelled videos? There is a valuable, but so far untapped, source of information contained in the video itself – the correspondence between the visual and the audio streams, and we introduce a novel “Audio-Visual Correspondence” learning task that makes use of this. Training visual and audio networks from scratch, without any additional supervision other than the raw unconstrained videos themselves, is shown to successfully solve this task, and, more interestingly, result in good visual and audio representations. These features set the new state-of-the-art on two sound classification benchmarks, and perform on par with the state-of-the-art self-supervised approaches on ImageNet classification. We also demonstrate that the network is able to localize objects in both modalities, as well as perform fine-grained recognition tasks.
Tasks
Published	2017-05-23
URL	http://arxiv.org/abs/1705.08168v2
PDF	http://arxiv.org/pdf/1705.08168v2.pdf
PWC	https://paperswithcode.com/paper/look-listen-and-learn
Repo	https://github.com/marl/l3embedding
Framework	tf

Generating Memorable Mnemonic Encodings of Numbers


Title	Generating Memorable Mnemonic Encodings of Numbers
Authors	Vincent Fiorentini, Megan Shao, Julie Medero
Abstract	The major system is a mnemonic system that can be used to memorize sequences of numbers. In this work, we present a method to automatically generate sentences that encode a given number. We propose several encoding models and compare the most promising ones in a password memorability study. The results of the study show that a model combining part-of-speech sentence templates with an $n$-gram language model produces the most memorable password representations.
Tasks	Language Modelling
Published	2017-05-07
URL	http://arxiv.org/abs/1705.02700v1
PDF	http://arxiv.org/pdf/1705.02700v1.pdf
PWC	https://paperswithcode.com/paper/generating-memorable-mnemonic-encodings-of
Repo	https://github.com/VinceFior/major-system
Framework	none