Paper Group AWR 15
Analysis and Optimization of Convolutional Neural Network Architectures. Batch-based Activity Recognition from Egocentric Photo-Streams Revisited. Temporally-Reweighted Chinese Restaurant Process Mixtures for Clustering, Imputing, and Forecasting Multivariate Time Series. Generating Multi-label Discrete Patient Records using Generative Adversarial …
Analysis and Optimization of Convolutional Neural Network Architectures
Title | Analysis and Optimization of Convolutional Neural Network Architectures |
Authors | Martin Thoma |
Abstract | Convolutional Neural Networks (CNNs) dominate various computer vision tasks since Alex Krizhevsky showed that they can be trained effectively and reduced the top-5 error from 26.2 % to 15.3 % on the ImageNet large scale visual recognition challenge. Many aspects of CNNs are examined in various publications, but literature about the analysis and construction of neural network architectures is rare. This work is one step to close this gap. A comprehensive overview over existing techniques for CNN analysis and topology construction is provided. A novel way to visualize classification errors with confusion matrices was developed. Based on this method, hierarchical classifiers are described and evaluated. Additionally, some results are confirmed and quantified for CIFAR-100. For example, the positive impact of smaller batch sizes, averaging ensembles, data augmentation and test-time transformations on the accuracy. Other results, such as the positive impact of learned color transformation on the test accuracy could not be confirmed. A model which has only one million learned parameters for an input size of 32x32x3 and 100 classes and which beats the state of the art on the benchmark dataset Asirra, GTSRB, HASYv2 and STL-10 was developed. |
Tasks | Data Augmentation, Object Recognition |
Published | 2017-07-31 |
URL | http://arxiv.org/abs/1707.09725v1 |
http://arxiv.org/pdf/1707.09725v1.pdf | |
PWC | https://paperswithcode.com/paper/analysis-and-optimization-of-convolutional |
Repo | https://github.com/MartinThoma/clana |
Framework | none |
Batch-based Activity Recognition from Egocentric Photo-Streams Revisited
Title | Batch-based Activity Recognition from Egocentric Photo-Streams Revisited |
Authors | Alejandro Cartas, Juan Marin, Petia Radeva, Mariella Dimiccoli |
Abstract | Wearable cameras can gather large a-mounts of image data that provide rich visual information about the daily activities of the wearer. Motivated by the large number of health applications that could be enabled by the automatic recognition of daily activities, such as lifestyle characterization for habit improvement, context-aware personal assistance and tele-rehabilitation services, we propose a system to classify 21 daily activities from photo-streams acquired by a wearable photo-camera. Our approach combines the advantages of a Late Fusion Ensemble strategy relying on convolutional neural networks at image level with the ability of recurrent neural networks to account for the temporal evolution of high level features in photo-streams without relying on event boundaries. The proposed batch-based approach achieved an overall accuracy of 89.85%, outperforming state of the art end-to-end methodologies. These results were achieved on a dataset consists of 44,902 egocentric pictures from three persons captured during 26 days in average. |
Tasks | Activity Recognition |
Published | 2017-10-11 |
URL | http://arxiv.org/abs/1710.04112v2 |
http://arxiv.org/pdf/1710.04112v2.pdf | |
PWC | https://paperswithcode.com/paper/batch-based-activity-recognition-from-1 |
Repo | https://github.com/gorayni/egocentric_photostreams |
Framework | none |
Temporally-Reweighted Chinese Restaurant Process Mixtures for Clustering, Imputing, and Forecasting Multivariate Time Series
Title | Temporally-Reweighted Chinese Restaurant Process Mixtures for Clustering, Imputing, and Forecasting Multivariate Time Series |
Authors | Feras A. Saad, Vikash K. Mansinghka |
Abstract | This article proposes a Bayesian nonparametric method for forecasting, imputation, and clustering in sparsely observed, multivariate time series data. The method is appropriate for jointly modeling hundreds of time series with widely varying, non-stationary dynamics. Given a collection of $N$ time series, the Bayesian model first partitions them into independent clusters using a Chinese restaurant process prior. Within a cluster, all time series are modeled jointly using a novel “temporally-reweighted” extension of the Chinese restaurant process mixture. Markov chain Monte Carlo techniques are used to obtain samples from the posterior distribution, which are then used to form predictive inferences. We apply the technique to challenging forecasting and imputation tasks using seasonal flu data from the US Center for Disease Control and Prevention, demonstrating superior forecasting accuracy and competitive imputation accuracy as compared to multiple widely used baselines. We further show that the model discovers interpretable clusters in datasets with hundreds of time series, using macroeconomic data from the Gapminder Foundation. |
Tasks | Imputation, Time Series |
Published | 2017-10-18 |
URL | http://arxiv.org/abs/1710.06900v2 |
http://arxiv.org/pdf/1710.06900v2.pdf | |
PWC | https://paperswithcode.com/paper/temporally-reweighted-chinese-restaurant |
Repo | https://github.com/probcomp/trcrpm |
Framework | none |
Generating Multi-label Discrete Patient Records using Generative Adversarial Networks
Title | Generating Multi-label Discrete Patient Records using Generative Adversarial Networks |
Authors | Edward Choi, Siddharth Biswal, Bradley Malin, Jon Duke, Walter F. Stewart, Jimeng Sun |
Abstract | Access to electronic health record (EHR) data has motivated computational advances in medical research. However, various concerns, particularly over privacy, can limit access to and collaborative use of EHR data. Sharing synthetic EHR data could mitigate risk. In this paper, we propose a new approach, medical Generative Adversarial Network (medGAN), to generate realistic synthetic patient records. Based on input real patient records, medGAN can generate high-dimensional discrete variables (e.g., binary and count features) via a combination of an autoencoder and generative adversarial networks. We also propose minibatch averaging to efficiently avoid mode collapse, and increase the learning efficiency with batch normalization and shortcut connections. To demonstrate feasibility, we showed that medGAN generates synthetic patient records that achieve comparable performance to real data on many experiments including distribution statistics, predictive modeling tasks and a medical expert review. We also empirically observe a limited privacy risk in both identity and attribute disclosure using medGAN. |
Tasks | |
Published | 2017-03-19 |
URL | http://arxiv.org/abs/1703.06490v3 |
http://arxiv.org/pdf/1703.06490v3.pdf | |
PWC | https://paperswithcode.com/paper/generating-multi-label-discrete-patient |
Repo | https://github.com/astorfi/cor-gan |
Framework | pytorch |
Adaptive Cardinality Estimation
Title | Adaptive Cardinality Estimation |
Authors | Oleg Ivanov, Sergey Bartunov |
Abstract | In this paper we address cardinality estimation problem which is an important subproblem in query optimization. Query optimization is a part of every relational DBMS responsible for finding the best way of the execution for the given query. These ways are called plans. The execution time of different plans may differ by several orders, so query optimizer has a great influence on the whole DBMS performance. We consider cost-based query optimization approach as the most popular one. It was observed that cost-based optimization quality depends much on cardinality estimation quality. Cardinality of the plan node is the number of tuples returned by it. In the paper we propose a novel cardinality estimation approach with the use of machine learning methods. The main point of the approach is using query execution statistics of the previously executed queries to improve cardinality estimations. We called this approach adaptive cardinality estimation to reflect this point. The approach is general, flexible, and easy to implement. The experimental evaluation shows that this approach significantly increases the quality of cardinality estimation, and therefore increases the DBMS performance for some queries by several times or even by several dozens of times. |
Tasks | |
Published | 2017-11-22 |
URL | http://arxiv.org/abs/1711.08330v1 |
http://arxiv.org/pdf/1711.08330v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-cardinality-estimation |
Repo | https://github.com/tigvarts/aqo |
Framework | none |
Comparative Study of CNN and RNN for Natural Language Processing
Title | Comparative Study of CNN and RNN for Natural Language Processing |
Authors | Wenpeng Yin, Katharina Kann, Mo Yu, Hinrich Schütze |
Abstract | Deep neural networks (DNN) have revolutionized the field of natural language processing (NLP). Convolutional neural network (CNN) and recurrent neural network (RNN), the two main types of DNN architectures, are widely explored to handle various NLP tasks. CNN is supposed to be good at extracting position-invariant features and RNN at modeling units in sequence. The state of the art on many NLP tasks often switches due to the battle between CNNs and RNNs. This work is the first systematic comparison of CNN and RNN on a wide range of representative NLP tasks, aiming to give basic guidance for DNN selection. |
Tasks | |
Published | 2017-02-07 |
URL | http://arxiv.org/abs/1702.01923v1 |
http://arxiv.org/pdf/1702.01923v1.pdf | |
PWC | https://paperswithcode.com/paper/comparative-study-of-cnn-and-rnn-for-natural |
Repo | https://github.com/Msundarv/TwiLoc |
Framework | none |
Semantic Structure and Interpretability of Word Embeddings
Title | Semantic Structure and Interpretability of Word Embeddings |
Authors | Lutfi Kerem Senel, Ihsan Utlu, Veysel Yucesoy, Aykut Koc, Tolga Cukur |
Abstract | Dense word embeddings, which encode semantic meanings of words to low dimensional vector spaces have become very popular in natural language processing (NLP) research due to their state-of-the-art performances in many NLP tasks. Word embeddings are substantially successful in capturing semantic relations among words, so a meaningful semantic structure must be present in the respective vector spaces. However, in many cases, this semantic structure is broadly and heterogeneously distributed across the embedding dimensions, which makes interpretation a big challenge. In this study, we propose a statistical method to uncover the latent semantic structure in the dense word embeddings. To perform our analysis we introduce a new dataset (SEMCAT) that contains more than 6500 words semantically grouped under 110 categories. We further propose a method to quantify the interpretability of the word embeddings; the proposed method is a practical alternative to the classical word intrusion test that requires human intervention. |
Tasks | Word Embeddings |
Published | 2017-11-01 |
URL | http://arxiv.org/abs/1711.00331v3 |
http://arxiv.org/pdf/1711.00331v3.pdf | |
PWC | https://paperswithcode.com/paper/semantic-structure-and-interpretability-of |
Repo | https://github.com/avaapm/SEMCATdataset2018 |
Framework | none |
On human motion prediction using recurrent neural networks
Title | On human motion prediction using recurrent neural networks |
Authors | Julieta Martinez, Michael J. Black, Javier Romero |
Abstract | Human motion modelling is a classical problem at the intersection of graphics and computer vision, with applications spanning human-computer interaction, motion synthesis, and motion prediction for virtual and augmented reality. Following the success of deep learning methods in several computer vision tasks, recent work has focused on using deep recurrent neural networks (RNNs) to model human motion, with the goal of learning time-dependent representations that perform tasks such as short-term motion prediction and long-term human motion synthesis. We examine recent work, with a focus on the evaluation methodologies commonly used in the literature, and show that, surprisingly, state-of-the-art performance can be achieved by a simple baseline that does not attempt to model motion at all. We investigate this result, and analyze recent RNN methods by looking at the architectures, loss functions, and training procedures used in state-of-the-art approaches. We propose three changes to the standard RNN models typically used for human motion, which result in a simple and scalable RNN architecture that obtains state-of-the-art performance on human motion prediction. |
Tasks | Motion Estimation, motion prediction |
Published | 2017-05-06 |
URL | http://arxiv.org/abs/1705.02445v1 |
http://arxiv.org/pdf/1705.02445v1.pdf | |
PWC | https://paperswithcode.com/paper/on-human-motion-prediction-using-recurrent |
Repo | https://github.com/una-dinosauria/human-motion-prediction |
Framework | tf |
AlignedReID: Surpassing Human-Level Performance in Person Re-Identification
Title | AlignedReID: Surpassing Human-Level Performance in Person Re-Identification |
Authors | Xuan Zhang, Hao Luo, Xing Fan, Weilai Xiang, Yixiao Sun, Qiqi Xiao, Wei Jiang, Chi Zhang, Jian Sun |
Abstract | In this paper, we propose a novel method called AlignedReID that extracts a global feature which is jointly learned with local features. Global feature learning benefits greatly from local feature learning, which performs an alignment/matching by calculating the shortest path between two sets of local features, without requiring extra supervision. After the joint learning, we only keep the global feature to compute the similarities between images. Our method achieves rank-1 accuracy of 94.4% on Market1501 and 97.8% on CUHK03, outperforming state-of-the-art methods by a large margin. We also evaluate human-level performance and demonstrate that our method is the first to surpass human-level performance on Market1501 and CUHK03, two widely used Person ReID datasets. |
Tasks | Person Re-Identification |
Published | 2017-11-22 |
URL | http://arxiv.org/abs/1711.08184v2 |
http://arxiv.org/pdf/1711.08184v2.pdf | |
PWC | https://paperswithcode.com/paper/alignedreid-surpassing-human-level |
Repo | https://github.com/Proxim123/aligned-reID-No-2- |
Framework | pytorch |
On the Diversity of Realistic Image Synthesis
Title | On the Diversity of Realistic Image Synthesis |
Authors | Zichen Yang, Haifeng Liu, Deng Cai |
Abstract | Many image processing tasks can be formulated as translating images between two image domains, such as colorization, super resolution and conditional image synthesis. In most of these tasks, an input image may correspond to multiple outputs. However, current existing approaches only show very minor diversity of the outputs. In this paper, we present a novel approach to synthesize diverse realistic images corresponding to a semantic layout. We introduce a diversity loss objective, which maximizes the distance between synthesized image pairs and links the input noise to the semantic segments in the synthesized images. Thus, our approach can not only produce diverse images, but also allow users to manipulate the output images by adjusting the noise manually. Experimental results show that images synthesized by our approach are significantly more diverse than that of the current existing works and equipping our diversity loss does not degrade the reality of the base networks. |
Tasks | Colorization, Image Generation, Super-Resolution |
Published | 2017-12-20 |
URL | http://arxiv.org/abs/1712.07329v1 |
http://arxiv.org/pdf/1712.07329v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-diversity-of-realistic-image-synthesis |
Repo | https://github.com/ZJULearning/diverse_image_synthesis |
Framework | pytorch |
Residual Attention Network for Image Classification
Title | Residual Attention Network for Image Classification |
Authors | Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, Xiaoou Tang |
Abstract | In this work, we propose “Residual Attention Network”, a convolutional neural network using attention mechanism which can incorporate with state-of-art feed forward network architecture in an end-to-end training fashion. Our Residual Attention Network is built by stacking Attention Modules which generate attention-aware features. The attention-aware features from different modules change adaptively as layers going deeper. Inside each Attention Module, bottom-up top-down feedforward structure is used to unfold the feedforward and feedback attention process into a single feedforward process. Importantly, we propose attention residual learning to train very deep Residual Attention Networks which can be easily scaled up to hundreds of layers. Extensive analyses are conducted on CIFAR-10 and CIFAR-100 datasets to verify the effectiveness of every module mentioned above. Our Residual Attention Network achieves state-of-the-art object recognition performance on three benchmark datasets including CIFAR-10 (3.90% error), CIFAR-100 (20.45% error) and ImageNet (4.8% single model and single crop, top-5 error). Note that, our method achieves 0.6% top-1 accuracy improvement with 46% trunk depth and 69% forward FLOPs comparing to ResNet-200. The experiment also demonstrates that our network is robust against noisy labels. |
Tasks | Image Classification, Object Recognition |
Published | 2017-04-23 |
URL | http://arxiv.org/abs/1704.06904v1 |
http://arxiv.org/pdf/1704.06904v1.pdf | |
PWC | https://paperswithcode.com/paper/residual-attention-network-for-image |
Repo | https://github.com/PistonY/ResidualAttentionNetwork |
Framework | tf |
Learning Important Features Through Propagating Activation Differences
Title | Learning Important Features Through Propagating Activation Differences |
Authors | Avanti Shrikumar, Peyton Greenside, Anshul Kundaje |
Abstract | The purported “black box” nature of neural networks is a barrier to adoption in applications where interpretability is essential. Here we present DeepLIFT (Deep Learning Important FeaTures), a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input. DeepLIFT compares the activation of each neuron to its ‘reference activation’ and assigns contribution scores according to the difference. By optionally giving separate consideration to positive and negative contributions, DeepLIFT can also reveal dependencies which are missed by other approaches. Scores can be computed efficiently in a single backward pass. We apply DeepLIFT to models trained on MNIST and simulated genomic data, and show significant advantages over gradient-based methods. Video tutorial: http://goo.gl/qKb7pL, ICML slides: bit.ly/deeplifticmlslides, ICML talk: https://vimeo.com/238275076, code: http://goo.gl/RM8jvH. |
Tasks | Interpretable Machine Learning |
Published | 2017-04-10 |
URL | https://arxiv.org/abs/1704.02685v2 |
https://arxiv.org/pdf/1704.02685v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-important-features-through |
Repo | https://github.com/saivarunr/xshap |
Framework | tf |
Online Learning Sensing Matrix and Sparsifying Dictionary Simultaneously for Compressive Sensing
Title | Online Learning Sensing Matrix and Sparsifying Dictionary Simultaneously for Compressive Sensing |
Authors | Tao Hong, Zhihui Zhu |
Abstract | This paper considers the problem of simultaneously learning the Sensing Matrix and Sparsifying Dictionary (SMSD) on a large training dataset. To address the formulated joint learning problem, we propose an online algorithm that consists of a closed-form solution for optimizing the sensing matrix with a fixed sparsifying dictionary and a stochastic method for learning the sparsifying dictionary on a large dataset when the sensing matrix is given. Benefiting from training on a large dataset, the obtained compressive sensing (CS) system by the proposed algorithm yields a much better performance in terms of signal recovery accuracy than the existing ones. The simulation results on natural images demonstrate the effectiveness of the suggested online algorithm compared with the existing methods. |
Tasks | Compressive Sensing |
Published | 2017-01-04 |
URL | http://arxiv.org/abs/1701.01000v4 |
http://arxiv.org/pdf/1701.01000v4.pdf | |
PWC | https://paperswithcode.com/paper/online-learning-sensing-matrix-and |
Repo | https://github.com/happyhongt/Online-Learning-SMSD-Large-Dataset |
Framework | none |
Look, Listen and Learn
Title | Look, Listen and Learn |
Authors | Relja Arandjelović, Andrew Zisserman |
Abstract | We consider the question: what can be learnt by looking at and listening to a large number of unlabelled videos? There is a valuable, but so far untapped, source of information contained in the video itself – the correspondence between the visual and the audio streams, and we introduce a novel “Audio-Visual Correspondence” learning task that makes use of this. Training visual and audio networks from scratch, without any additional supervision other than the raw unconstrained videos themselves, is shown to successfully solve this task, and, more interestingly, result in good visual and audio representations. These features set the new state-of-the-art on two sound classification benchmarks, and perform on par with the state-of-the-art self-supervised approaches on ImageNet classification. We also demonstrate that the network is able to localize objects in both modalities, as well as perform fine-grained recognition tasks. |
Tasks | |
Published | 2017-05-23 |
URL | http://arxiv.org/abs/1705.08168v2 |
http://arxiv.org/pdf/1705.08168v2.pdf | |
PWC | https://paperswithcode.com/paper/look-listen-and-learn |
Repo | https://github.com/marl/l3embedding |
Framework | tf |
Generating Memorable Mnemonic Encodings of Numbers
Title | Generating Memorable Mnemonic Encodings of Numbers |
Authors | Vincent Fiorentini, Megan Shao, Julie Medero |
Abstract | The major system is a mnemonic system that can be used to memorize sequences of numbers. In this work, we present a method to automatically generate sentences that encode a given number. We propose several encoding models and compare the most promising ones in a password memorability study. The results of the study show that a model combining part-of-speech sentence templates with an $n$-gram language model produces the most memorable password representations. |
Tasks | Language Modelling |
Published | 2017-05-07 |
URL | http://arxiv.org/abs/1705.02700v1 |
http://arxiv.org/pdf/1705.02700v1.pdf | |
PWC | https://paperswithcode.com/paper/generating-memorable-mnemonic-encodings-of |
Repo | https://github.com/VinceFior/major-system |
Framework | none |