July 30, 2019

3150 words 15 mins read

Paper Group AWR 19

DeepCorrect: Correcting DNN models against Image Distortions. TensorLayer: A Versatile Library for Efficient Deep Learning Development. First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations. PixColor: Pixel Recursive Colorization. Adversarial Synthesis Learning Enables Segmentation Without Target Modality Ground Truth. H …

DeepCorrect: Correcting DNN models against Image Distortions


Title	DeepCorrect: Correcting DNN models against Image Distortions
Authors	Tejas Borkar, Lina Karam
Abstract	In recent years, the widespread use of deep neural networks (DNNs) has facilitated great improvements in performance for computer vision tasks like image classification and object recognition. In most realistic computer vision applications, an input image undergoes some form of image distortion such as blur and additive noise during image acquisition or transmission. Deep networks trained on pristine images perform poorly when tested on such distortions. In this paper, we evaluate the effect of image distortions like Gaussian blur and additive noise on the activations of pre-trained convolutional filters. We propose a metric to identify the most noise susceptible convolutional filters and rank them in order of the highest gain in classification accuracy upon correction. In our proposed approach called DeepCorrect, we apply small stacks of convolutional layers with residual connections, at the output of these ranked filters and train them to correct the worst distortion affected filter activations, whilst leaving the rest of the pre-trained filter outputs in the network unchanged. Performance results show that applying DeepCorrect models for common vision tasks like image classification (ImageNet), object recognition (Caltech-101, Caltech-256) and scene classification (SUN-397), significantly improves the robustness of DNNs against distorted images and outperforms other alternative approaches..
Tasks	Image Classification, Object Recognition, Scene Classification
Published	2017-05-05
URL	http://arxiv.org/abs/1705.02406v5
PDF	http://arxiv.org/pdf/1705.02406v5.pdf
PWC	https://paperswithcode.com/paper/deepcorrect-correcting-dnn-models-against
Repo	https://github.com/tsborkar/DeepCorrect
Framework	none

TensorLayer: A Versatile Library for Efficient Deep Learning Development


Title	TensorLayer: A Versatile Library for Efficient Deep Learning Development
Authors	Hao Dong, Akara Supratak, Luo Mai, Fangde Liu, Axel Oehmichen, Simiao Yu, Yike Guo
Abstract	Deep learning has enabled major advances in the fields of computer vision, natural language processing, and multimedia among many others. Developing a deep learning system is arduous and complex, as it involves constructing neural network architectures, managing training/trained models, tuning optimization process, preprocessing and organizing data, etc. TensorLayer is a versatile Python library that aims at helping researchers and engineers efficiently develop deep learning systems. It offers rich abstractions for neural networks, model and data management, and parallel workflow mechanism. While boosting efficiency, TensorLayer maintains both performance and scalability. TensorLayer was released in September 2016 on GitHub, and has helped people from academia and industry develop real-world applications of deep learning.
Tasks
Published	2017-07-26
URL	http://arxiv.org/abs/1707.08551v3
PDF	http://arxiv.org/pdf/1707.08551v3.pdf
PWC	https://paperswithcode.com/paper/tensorlayer-a-versatile-library-for-efficient
Repo	https://github.com/akaraspt/tl_paper
Framework	tf

First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations


Title	First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations
Authors	Guillermo Garcia-Hernando, Shanxin Yuan, Seungryul Baek, Tae-Kyun Kim
Abstract	In this work we study the use of 3D hand poses to recognize first-person dynamic hand actions interacting with 3D objects. Towards this goal, we collected RGB-D video sequences comprised of more than 100K frames of 45 daily hand action categories, involving 26 different objects in several hand configurations. To obtain hand pose annotations, we used our own mo-cap system that automatically infers the 3D location of each of the 21 joints of a hand model via 6 magnetic sensors and inverse kinematics. Additionally, we recorded the 6D object poses and provide 3D object models for a subset of hand-object interaction sequences. To the best of our knowledge, this is the first benchmark that enables the study of first-person hand actions with the use of 3D hand poses. We present an extensive experimental evaluation of RGB-D and pose-based action recognition by 18 baselines/state-of-the-art approaches. The impact of using appearance features, poses, and their combinations are measured, and the different training/testing protocols are evaluated. Finally, we assess how ready the 3D hand pose estimation field is when hands are severely occluded by objects in egocentric views and its influence on action recognition. From the results, we see clear benefits of using hand pose as a cue for action recognition compared to other data modalities. Our dataset and experiments can be of interest to communities of 3D hand pose estimation, 6D object pose, and robotics as well as action recognition.
Tasks	Egocentric Activity Recognition, Hand Gesture Recognition, Hand Pose Estimation, Pose Estimation, Temporal Action Localization
Published	2017-04-08
URL	http://arxiv.org/abs/1704.02463v2
PDF	http://arxiv.org/pdf/1704.02463v2.pdf
PWC	https://paperswithcode.com/paper/first-person-hand-action-benchmark-with-rgb-d
Repo	https://github.com/guiggh/hand_pose_action
Framework	none

PixColor: Pixel Recursive Colorization


Title	PixColor: Pixel Recursive Colorization
Authors	Sergio Guadarrama, Ryan Dahl, David Bieber, Mohammad Norouzi, Jonathon Shlens, Kevin Murphy
Abstract	We propose a novel approach to automatically produce multiple colorized versions of a grayscale image. Our method results from the observation that the task of automated colorization is relatively easy given a low-resolution version of the color image. We first train a conditional PixelCNN to generate a low resolution color for a given grayscale image. Then, given the generated low-resolution color image and the original grayscale image as inputs, we train a second CNN to generate a high-resolution colorization of an image. We demonstrate that our approach produces more diverse and plausible colorizations than existing methods, as judged by human raters in a “Visual Turing Test”.
Tasks	Colorization
Published	2017-05-19
URL	http://arxiv.org/abs/1705.07208v2
PDF	http://arxiv.org/pdf/1705.07208v2.pdf
PWC	https://paperswithcode.com/paper/pixcolor-pixel-recursive-colorization
Repo	https://github.com/demul/auto_colorization_project
Framework	none

Adversarial Synthesis Learning Enables Segmentation Without Target Modality Ground Truth


Title	Adversarial Synthesis Learning Enables Segmentation Without Target Modality Ground Truth
Authors	Yuankai Huo, Zhoubing Xu, Shunxing Bao, Albert Assad, Richard G. Abramson, Bennett A. Landman
Abstract	A lack of generalizability is one key limitation of deep learning based segmentation. Typically, one manually labels new training images when segmenting organs in different imaging modalities or segmenting abnormal organs from distinct disease cohorts. The manual efforts can be alleviated if one is able to reuse manual labels from one modality (e.g., MRI) to train a segmentation network for a new modality (e.g., CT). Previously, two stage methods have been proposed to use cycle generative adversarial networks (CycleGAN) to synthesize training images for a target modality. Then, these efforts trained a segmentation network independently using synthetic images. However, these two independent stages did not use the complementary information between synthesis and segmentation. Herein, we proposed a novel end-to-end synthesis and segmentation network (EssNet) to achieve the unpaired MRI to CT image synthesis and CT splenomegaly segmentation simultaneously without using manual labels on CT. The end-to-end EssNet achieved significantly higher median Dice similarity coefficient (0.9188) than the two stages strategy (0.8801), and even higher than canonical multi-atlas segmentation (0.9125) and ResNet method (0.9107), which used the CT manual labels.
Tasks	Image Generation, Image-to-Image Translation, Medical Image Segmentation, Splenomegaly Segmentation On Multi-Modal Mri
Published	2017-12-20
URL	http://arxiv.org/abs/1712.07695v1
PDF	http://arxiv.org/pdf/1712.07695v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-synthesis-learning-enables
Repo	https://github.com/MASILab/SynSeg-Net
Framework	caffe2

How to evaluate word embeddings? On importance of data efficiency and simple supervised tasks


Title	How to evaluate word embeddings? On importance of data efficiency and simple supervised tasks
Authors	Stanisław Jastrzebski, Damian Leśniak, Wojciech Marian Czarnecki
Abstract	Maybe the single most important goal of representation learning is making subsequent learning faster. Surprisingly, this fact is not well reflected in the way embeddings are evaluated. In addition, recent practice in word embeddings points towards importance of learning specialized representations. We argue that focus of word representation evaluation should reflect those trends and shift towards evaluating what useful information is easily accessible. Specifically, we propose that evaluation should focus on data efficiency and simple supervised tasks, where the amount of available data is varied and scores of a supervised model are reported for each subset (as commonly done in transfer learning). In order to illustrate significance of such analysis, a comprehensive evaluation of selected word embeddings is presented. Proposed approach yields a more complete picture and brings new insight into performance characteristics, for instance information about word similarity or analogy tends to be non–linearly encoded in the embedding space, which questions the cosine-based, unsupervised, evaluation methods. All results and analysis scripts are available online.
Tasks	Representation Learning, Transfer Learning, Word Embeddings
Published	2017-02-07
URL	http://arxiv.org/abs/1702.02170v1
PDF	http://arxiv.org/pdf/1702.02170v1.pdf
PWC	https://paperswithcode.com/paper/how-to-evaluate-word-embeddings-on-importance
Repo	https://github.com/PyENE/meta-word-embedding
Framework	none

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering


Title	Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Authors	Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang
Abstract	Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning. In this work, we propose a combined bottom-up and top-down attention mechanism that enables attention to be calculated at the level of objects and other salient image regions. This is the natural basis for attention to be considered. Within our approach, the bottom-up mechanism (based on Faster R-CNN) proposes image regions, each with an associated feature vector, while the top-down mechanism determines feature weightings. Applying this approach to image captioning, our results on the MSCOCO test server establish a new state-of-the-art for the task, achieving CIDEr / SPICE / BLEU-4 scores of 117.9, 21.5 and 36.9, respectively. Demonstrating the broad applicability of the method, applying the same approach to VQA we obtain first place in the 2017 VQA Challenge.
Tasks	Image Captioning, Visual Question Answering
Published	2017-07-25
URL	http://arxiv.org/abs/1707.07998v3
PDF	http://arxiv.org/pdf/1707.07998v3.pdf
PWC	https://paperswithcode.com/paper/bottom-up-and-top-down-attention-for-image
Repo	https://github.com/feifengwhu/question_attention
Framework	pytorch

Learning to cluster in order to transfer across domains and tasks


Title	Learning to cluster in order to transfer across domains and tasks
Authors	Yen-Chang Hsu, Zhaoyang Lv, Zsolt Kira
Abstract	This paper introduces a novel method to perform transfer learning across domains and tasks, formulating it as a problem of learning to cluster. The key insight is that, in addition to features, we can transfer similarity information and this is sufficient to learn a similarity function and clustering network to perform both domain adaptation and cross-task transfer learning. We begin by reducing categorical information to pairwise constraints, which only considers whether two instances belong to the same class or not. This similarity is category-agnostic and can be learned from data in the source domain using a similarity network. We then present two novel approaches for performing transfer learning using this similarity function. First, for unsupervised domain adaptation, we design a new loss function to regularize classification with a constrained clustering loss, hence learning a clustering network with the transferred similarity metric generating the training inputs. Second, for cross-task learning (i.e., unsupervised clustering with unseen categories), we propose a framework to reconstruct and estimate the number of semantic clusters, again using the clustering network. Since the similarity network is noisy, the key is to use a robust clustering algorithm, and we show that our formulation is more robust than the alternative constrained and unconstrained clustering approaches. Using this method, we first show state of the art results for the challenging cross-task problem, applied on Omniglot and ImageNet. Our results show that we can reconstruct semantic clusters with high accuracy. We then evaluate the performance of cross-domain transfer using images from the Office-31 and SVHN-MNIST tasks and present top accuracy on both datasets. Our approach doesn’t explicitly deal with domain discrepancy. If we combine with a domain adaptation loss, it shows further improvement.
Tasks	Domain Adaptation, Omniglot, Transfer Learning, Unsupervised Domain Adaptation
Published	2017-11-28
URL	http://arxiv.org/abs/1711.10125v3
PDF	http://arxiv.org/pdf/1711.10125v3.pdf
PWC	https://paperswithcode.com/paper/learning-to-cluster-in-order-to-transfer
Repo	https://github.com/GT-RIPL/L2C
Framework	pytorch

Paying Attention to Descriptions Generated by Image Captioning Models


Title	Paying Attention to Descriptions Generated by Image Captioning Models
Authors	Hamed R. Tavakoli, Rakshith Shetty, Ali Borji, Jorma Laaksonen
Abstract	To bridge the gap between humans and machines in image understanding and describing, we need further insight into how people describe a perceived scene. In this paper, we study the agreement between bottom-up saliency-based visual attention and object referrals in scene description constructs. We investigate the properties of human-written descriptions and machine-generated ones. We then propose a saliency-boosted image captioning model in order to investigate benefits from low-level cues in language models. We learn that (1) humans mention more salient objects earlier than less salient ones in their descriptions, (2) the better a captioning model performs, the better attention agreement it has with human descriptions, (3) the proposed saliency-boosted model, compared to its baseline form, does not improve significantly on the MS COCO database, indicating explicit bottom-up boosting does not help when the task is well learnt and tuned on a data, (4) a better generalization is, however, observed for the saliency-boosted model on unseen data.
Tasks	Image Captioning
Published	2017-04-24
URL	http://arxiv.org/abs/1704.07434v3
PDF	http://arxiv.org/pdf/1704.07434v3.pdf
PWC	https://paperswithcode.com/paper/paying-attention-to-descriptions-generated-by
Repo	https://github.com/rakshithShetty/captionGAN
Framework	none

FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence


Title	FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence
Authors	Seungryong Kim, Dongbo Min, Bumsub Ham, Sangryul Jeon, Stephen Lin, Kwanghoon Sohn
Abstract	We present a descriptor, called fully convolutional self-similarity (FCSS), for dense semantic correspondence. To robustly match points among different instances within the same object class, we formulate FCSS using local self-similarity (LSS) within a fully convolutional network. In contrast to existing CNN-based descriptors, FCSS is inherently insensitive to intra-class appearance variations because of its LSS-based structure, while maintaining the precise localization ability of deep neural networks. The sampling patterns of local structure and the self-similarity measure are jointly learned within the proposed network in an end-to-end and multi-scale manner. As training data for semantic correspondence is rather limited, we propose to leverage object candidate priors provided in existing image datasets and also correspondence consistency between object pairs to enable weakly-supervised learning. Experiments demonstrate that FCSS outperforms conventional handcrafted descriptors and CNN-based descriptors on various benchmarks.
Tasks
Published	2017-02-03
URL	http://arxiv.org/abs/1702.00926v1
PDF	http://arxiv.org/pdf/1702.00926v1.pdf
PWC	https://paperswithcode.com/paper/fcss-fully-convolutional-self-similarity-for
Repo	https://github.com/seungryong/FCSS
Framework	none

Extractive Summarization using Deep Learning


Title	Extractive Summarization using Deep Learning
Authors	Sukriti Verma, Vagisha Nidhi
Abstract	This paper proposes a text summarization approach for factual reports using a deep learning model. This approach consists of three phases: feature extraction, feature enhancement, and summary generation, which work together to assimilate core information and generate a coherent, understandable summary. We are exploring various features to improve the set of sentences selected for the summary, and are using a Restricted Boltzmann Machine to enhance and abstract those features to improve resultant accuracy without losing any important information. The sentences are scored based on those enhanced features and an extractive summary is constructed. Experimentation carried out on several articles demonstrates the effectiveness of the proposed approach. Source code available at: https://github.com/vagisha-nidhi/TextSummarizer
Tasks	Text Summarization
Published	2017-08-15
URL	http://arxiv.org/abs/1708.04439v2
PDF	http://arxiv.org/pdf/1708.04439v2.pdf
PWC	https://paperswithcode.com/paper/extractive-summarization-using-deep-learning
Repo	https://github.com/vagisha-nidhi/TextSummarizer
Framework	none

Automatically Annotated Turkish Corpus for Named Entity Recognition and Text Categorization using Large-Scale Gazetteers


Title	Automatically Annotated Turkish Corpus for Named Entity Recognition and Text Categorization using Large-Scale Gazetteers
Authors	H. Bahadir Sahin, Caglar Tirkaz, Eray Yildiz, Mustafa Tolga Eren, Ozan Sonmez
Abstract	Turkish Wikipedia Named-Entity Recognition and Text Categorization (TWNERTC) dataset is a collection of automatically categorized and annotated sentences obtained from Wikipedia. We constructed large-scale gazetteers by using a graph crawler algorithm to extract relevant entity and domain information from a semantic knowledge base, Freebase. The constructed gazetteers contains approximately 300K entities with thousands of fine-grained entity types under 77 different domains. Since automated processes are prone to ambiguity, we also introduce two new content specific noise reduction methodologies. Moreover, we map fine-grained entity types to the equivalent four coarse-grained types: person, loc, org, misc. Eventually, we construct six different dataset versions and evaluate the quality of annotations by comparing ground truths from human annotators. We make these datasets publicly available to support studies on Turkish named-entity recognition (NER) and text categorization (TC).
Tasks	Named Entity Recognition, Text Categorization
Published	2017-02-08
URL	http://arxiv.org/abs/1702.02363v2
PDF	http://arxiv.org/pdf/1702.02363v2.pdf
PWC	https://paperswithcode.com/paper/automatically-annotated-turkish-corpus-for
Repo	https://github.com/juand-r/entity-recognition-datasets
Framework	torch

Pyndri: a Python Interface to the Indri Search Engine


Title	Pyndri: a Python Interface to the Indri Search Engine
Authors	Christophe Van Gysel, Evangelos Kanoulas, Maarten de Rijke
Abstract	We introduce pyndri, a Python interface to the Indri search engine. Pyndri allows to access Indri indexes from Python at two levels: (1) dictionary and tokenized document collection, (2) evaluating queries on the index. We hope that with the release of pyndri, we will stimulate reproducible, open and fast-paced IR research.
Tasks
Published	2017-01-03
URL	http://arxiv.org/abs/1701.00749v1
PDF	http://arxiv.org/pdf/1701.00749v1.pdf
PWC	https://paperswithcode.com/paper/pyndri-a-python-interface-to-the-indri-search
Repo	https://github.com/cvangysel/pyndri
Framework	none

SSD-6D: Making RGB-based 3D detection and 6D pose estimation great again


Title	SSD-6D: Making RGB-based 3D detection and 6D pose estimation great again
Authors	Wadim Kehl, Fabian Manhardt, Federico Tombari, Slobodan Ilic, Nassir Navab
Abstract	We present a novel method for detecting 3D model instances and estimating their 6D poses from RGB data in a single shot. To this end, we extend the popular SSD paradigm to cover the full 6D pose space and train on synthetic model data only. Our approach competes or surpasses current state-of-the-art methods that leverage RGB-D data on multiple challenging datasets. Furthermore, our method produces these results at around 10Hz, which is many times faster than the related methods. For the sake of reproducibility, we make our trained networks and detection code publicly available.
Tasks	6D Pose Estimation, 6D Pose Estimation using RGB, Pose Estimation
Published	2017-11-27
URL	http://arxiv.org/abs/1711.10006v1
PDF	http://arxiv.org/pdf/1711.10006v1.pdf
PWC	https://paperswithcode.com/paper/ssd-6d-making-rgb-based-3d-detection-and-6d
Repo	https://github.com/wadimkehl/ssd-6d
Framework	tf

Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes


Title	Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes
Authors	Aaron Sidford, Mengdi Wang, Xian Wu, Yinyu Ye
Abstract	In this paper we provide faster algorithms for approximately solving discounted Markov Decision Processes in multiple parameter regimes. Given a discounted Markov Decision Process (DMDP) with $S$ states, $A$ actions, discount factor $\gamma\in(0,1)$, and rewards in the range $[-M, M]$, we show how to compute an $\epsilon$-optimal policy, with probability $1 - \delta$ in time [ \tilde{O}\left( \left(S^2 A + \frac{S A}{(1 - \gamma)^3} \right) \log\left( \frac{M}{\epsilon} \right) \log\left( \frac{1}{\delta} \right) \right) ~ . ] This contribution reflects the first nearly linear time, nearly linearly convergent algorithm for solving DMDPs for intermediate values of $\gamma$. We also show how to obtain improved sublinear time algorithms provided we can sample from the transition function in $O(1)$ time. Under this assumption we provide an algorithm which computes an $\epsilon$-optimal policy with probability $1 - \delta$ in time [ \tilde{O} \left(\frac{S A M^2}{(1 - \gamma)^4 \epsilon^2} \log \left(\frac{1}{\delta}\right) \right) ~. ] Lastly, we extend both these algorithms to solve finite horizon MDPs. Our algorithms improve upon the previous best for approximately computing optimal policies for fixed-horizon MDPs in multiple parameter regimes. Interestingly, we obtain our results by a careful modification of approximate value iteration. We show how to combine classic approximate value iteration analysis with new techniques in variance reduction. Our fastest algorithms leverage further insights to ensure that our algorithms make monotonic progress towards the optimal value. This paper is one of few instances in using sampling to obtain a linearly convergent linear programming algorithm and we hope that the analysis may be useful more broadly.
Tasks
Published	2017-10-27
URL	http://arxiv.org/abs/1710.09988v2
PDF	http://arxiv.org/pdf/1710.09988v2.pdf
PWC	https://paperswithcode.com/paper/variance-reduced-value-iteration-and-faster
Repo	https://github.com/uclaopt/AsyncQVI
Framework	none