Paper Group AWR 19
DeepCorrect: Correcting DNN models against Image Distortions. TensorLayer: A Versatile Library for Efficient Deep Learning Development. First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations. PixColor: Pixel Recursive Colorization. Adversarial Synthesis Learning Enables Segmentation Without Target Modality Ground Truth. H …
DeepCorrect: Correcting DNN models against Image Distortions
Title | DeepCorrect: Correcting DNN models against Image Distortions |
Authors | Tejas Borkar, Lina Karam |
Abstract | In recent years, the widespread use of deep neural networks (DNNs) has facilitated great improvements in performance for computer vision tasks like image classification and object recognition. In most realistic computer vision applications, an input image undergoes some form of image distortion such as blur and additive noise during image acquisition or transmission. Deep networks trained on pristine images perform poorly when tested on such distortions. In this paper, we evaluate the effect of image distortions like Gaussian blur and additive noise on the activations of pre-trained convolutional filters. We propose a metric to identify the most noise susceptible convolutional filters and rank them in order of the highest gain in classification accuracy upon correction. In our proposed approach called DeepCorrect, we apply small stacks of convolutional layers with residual connections, at the output of these ranked filters and train them to correct the worst distortion affected filter activations, whilst leaving the rest of the pre-trained filter outputs in the network unchanged. Performance results show that applying DeepCorrect models for common vision tasks like image classification (ImageNet), object recognition (Caltech-101, Caltech-256) and scene classification (SUN-397), significantly improves the robustness of DNNs against distorted images and outperforms other alternative approaches.. |
Tasks | Image Classification, Object Recognition, Scene Classification |
Published | 2017-05-05 |
URL | http://arxiv.org/abs/1705.02406v5 |
http://arxiv.org/pdf/1705.02406v5.pdf | |
PWC | https://paperswithcode.com/paper/deepcorrect-correcting-dnn-models-against |
Repo | https://github.com/tsborkar/DeepCorrect |
Framework | none |
TensorLayer: A Versatile Library for Efficient Deep Learning Development
Title | TensorLayer: A Versatile Library for Efficient Deep Learning Development |
Authors | Hao Dong, Akara Supratak, Luo Mai, Fangde Liu, Axel Oehmichen, Simiao Yu, Yike Guo |
Abstract | Deep learning has enabled major advances in the fields of computer vision, natural language processing, and multimedia among many others. Developing a deep learning system is arduous and complex, as it involves constructing neural network architectures, managing training/trained models, tuning optimization process, preprocessing and organizing data, etc. TensorLayer is a versatile Python library that aims at helping researchers and engineers efficiently develop deep learning systems. It offers rich abstractions for neural networks, model and data management, and parallel workflow mechanism. While boosting efficiency, TensorLayer maintains both performance and scalability. TensorLayer was released in September 2016 on GitHub, and has helped people from academia and industry develop real-world applications of deep learning. |
Tasks | |
Published | 2017-07-26 |
URL | http://arxiv.org/abs/1707.08551v3 |
http://arxiv.org/pdf/1707.08551v3.pdf | |
PWC | https://paperswithcode.com/paper/tensorlayer-a-versatile-library-for-efficient |
Repo | https://github.com/akaraspt/tl_paper |
Framework | tf |
First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations
Title | First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations |
Authors | Guillermo Garcia-Hernando, Shanxin Yuan, Seungryul Baek, Tae-Kyun Kim |
Abstract | In this work we study the use of 3D hand poses to recognize first-person dynamic hand actions interacting with 3D objects. Towards this goal, we collected RGB-D video sequences comprised of more than 100K frames of 45 daily hand action categories, involving 26 different objects in several hand configurations. To obtain hand pose annotations, we used our own mo-cap system that automatically infers the 3D location of each of the 21 joints of a hand model via 6 magnetic sensors and inverse kinematics. Additionally, we recorded the 6D object poses and provide 3D object models for a subset of hand-object interaction sequences. To the best of our knowledge, this is the first benchmark that enables the study of first-person hand actions with the use of 3D hand poses. We present an extensive experimental evaluation of RGB-D and pose-based action recognition by 18 baselines/state-of-the-art approaches. The impact of using appearance features, poses, and their combinations are measured, and the different training/testing protocols are evaluated. Finally, we assess how ready the 3D hand pose estimation field is when hands are severely occluded by objects in egocentric views and its influence on action recognition. From the results, we see clear benefits of using hand pose as a cue for action recognition compared to other data modalities. Our dataset and experiments can be of interest to communities of 3D hand pose estimation, 6D object pose, and robotics as well as action recognition. |
Tasks | Egocentric Activity Recognition, Hand Gesture Recognition, Hand Pose Estimation, Pose Estimation, Temporal Action Localization |
Published | 2017-04-08 |
URL | http://arxiv.org/abs/1704.02463v2 |
http://arxiv.org/pdf/1704.02463v2.pdf | |
PWC | https://paperswithcode.com/paper/first-person-hand-action-benchmark-with-rgb-d |
Repo | https://github.com/guiggh/hand_pose_action |
Framework | none |
PixColor: Pixel Recursive Colorization
Title | PixColor: Pixel Recursive Colorization |
Authors | Sergio Guadarrama, Ryan Dahl, David Bieber, Mohammad Norouzi, Jonathon Shlens, Kevin Murphy |
Abstract | We propose a novel approach to automatically produce multiple colorized versions of a grayscale image. Our method results from the observation that the task of automated colorization is relatively easy given a low-resolution version of the color image. We first train a conditional PixelCNN to generate a low resolution color for a given grayscale image. Then, given the generated low-resolution color image and the original grayscale image as inputs, we train a second CNN to generate a high-resolution colorization of an image. We demonstrate that our approach produces more diverse and plausible colorizations than existing methods, as judged by human raters in a “Visual Turing Test”. |
Tasks | Colorization |
Published | 2017-05-19 |
URL | http://arxiv.org/abs/1705.07208v2 |
http://arxiv.org/pdf/1705.07208v2.pdf | |
PWC | https://paperswithcode.com/paper/pixcolor-pixel-recursive-colorization |
Repo | https://github.com/demul/auto_colorization_project |
Framework | none |
Adversarial Synthesis Learning Enables Segmentation Without Target Modality Ground Truth
Title | Adversarial Synthesis Learning Enables Segmentation Without Target Modality Ground Truth |
Authors | Yuankai Huo, Zhoubing Xu, Shunxing Bao, Albert Assad, Richard G. Abramson, Bennett A. Landman |
Abstract | A lack of generalizability is one key limitation of deep learning based segmentation. Typically, one manually labels new training images when segmenting organs in different imaging modalities or segmenting abnormal organs from distinct disease cohorts. The manual efforts can be alleviated if one is able to reuse manual labels from one modality (e.g., MRI) to train a segmentation network for a new modality (e.g., CT). Previously, two stage methods have been proposed to use cycle generative adversarial networks (CycleGAN) to synthesize training images for a target modality. Then, these efforts trained a segmentation network independently using synthetic images. However, these two independent stages did not use the complementary information between synthesis and segmentation. Herein, we proposed a novel end-to-end synthesis and segmentation network (EssNet) to achieve the unpaired MRI to CT image synthesis and CT splenomegaly segmentation simultaneously without using manual labels on CT. The end-to-end EssNet achieved significantly higher median Dice similarity coefficient (0.9188) than the two stages strategy (0.8801), and even higher than canonical multi-atlas segmentation (0.9125) and ResNet method (0.9107), which used the CT manual labels. |
Tasks | Image Generation, Image-to-Image Translation, Medical Image Segmentation, Splenomegaly Segmentation On Multi-Modal Mri |
Published | 2017-12-20 |
URL | http://arxiv.org/abs/1712.07695v1 |
http://arxiv.org/pdf/1712.07695v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-synthesis-learning-enables |
Repo | https://github.com/MASILab/SynSeg-Net |
Framework | caffe2 |
How to evaluate word embeddings? On importance of data efficiency and simple supervised tasks
Title | How to evaluate word embeddings? On importance of data efficiency and simple supervised tasks |
Authors | Stanisław Jastrzebski, Damian Leśniak, Wojciech Marian Czarnecki |
Abstract | Maybe the single most important goal of representation learning is making subsequent learning faster. Surprisingly, this fact is not well reflected in the way embeddings are evaluated. In addition, recent practice in word embeddings points towards importance of learning specialized representations. We argue that focus of word representation evaluation should reflect those trends and shift towards evaluating what useful information is easily accessible. Specifically, we propose that evaluation should focus on data efficiency and simple supervised tasks, where the amount of available data is varied and scores of a supervised model are reported for each subset (as commonly done in transfer learning). In order to illustrate significance of such analysis, a comprehensive evaluation of selected word embeddings is presented. Proposed approach yields a more complete picture and brings new insight into performance characteristics, for instance information about word similarity or analogy tends to be non–linearly encoded in the embedding space, which questions the cosine-based, unsupervised, evaluation methods. All results and analysis scripts are available online. |
Tasks | Representation Learning, Transfer Learning, Word Embeddings |
Published | 2017-02-07 |
URL | http://arxiv.org/abs/1702.02170v1 |
http://arxiv.org/pdf/1702.02170v1.pdf | |
PWC | https://paperswithcode.com/paper/how-to-evaluate-word-embeddings-on-importance |
Repo | https://github.com/PyENE/meta-word-embedding |
Framework | none |
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Title | Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering |
Authors | Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang |
Abstract | Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning. In this work, we propose a combined bottom-up and top-down attention mechanism that enables attention to be calculated at the level of objects and other salient image regions. This is the natural basis for attention to be considered. Within our approach, the bottom-up mechanism (based on Faster R-CNN) proposes image regions, each with an associated feature vector, while the top-down mechanism determines feature weightings. Applying this approach to image captioning, our results on the MSCOCO test server establish a new state-of-the-art for the task, achieving CIDEr / SPICE / BLEU-4 scores of 117.9, 21.5 and 36.9, respectively. Demonstrating the broad applicability of the method, applying the same approach to VQA we obtain first place in the 2017 VQA Challenge. |
Tasks | Image Captioning, Visual Question Answering |
Published | 2017-07-25 |
URL | http://arxiv.org/abs/1707.07998v3 |
http://arxiv.org/pdf/1707.07998v3.pdf | |
PWC | https://paperswithcode.com/paper/bottom-up-and-top-down-attention-for-image |
Repo | https://github.com/feifengwhu/question_attention |
Framework | pytorch |
Learning to cluster in order to transfer across domains and tasks
Title | Learning to cluster in order to transfer across domains and tasks |
Authors | Yen-Chang Hsu, Zhaoyang Lv, Zsolt Kira |
Abstract | This paper introduces a novel method to perform transfer learning across domains and tasks, formulating it as a problem of learning to cluster. The key insight is that, in addition to features, we can transfer similarity information and this is sufficient to learn a similarity function and clustering network to perform both domain adaptation and cross-task transfer learning. We begin by reducing categorical information to pairwise constraints, which only considers whether two instances belong to the same class or not. This similarity is category-agnostic and can be learned from data in the source domain using a similarity network. We then present two novel approaches for performing transfer learning using this similarity function. First, for unsupervised domain adaptation, we design a new loss function to regularize classification with a constrained clustering loss, hence learning a clustering network with the transferred similarity metric generating the training inputs. Second, for cross-task learning (i.e., unsupervised clustering with unseen categories), we propose a framework to reconstruct and estimate the number of semantic clusters, again using the clustering network. Since the similarity network is noisy, the key is to use a robust clustering algorithm, and we show that our formulation is more robust than the alternative constrained and unconstrained clustering approaches. Using this method, we first show state of the art results for the challenging cross-task problem, applied on Omniglot and ImageNet. Our results show that we can reconstruct semantic clusters with high accuracy. We then evaluate the performance of cross-domain transfer using images from the Office-31 and SVHN-MNIST tasks and present top accuracy on both datasets. Our approach doesn’t explicitly deal with domain discrepancy. If we combine with a domain adaptation loss, it shows further improvement. |
Tasks | Domain Adaptation, Omniglot, Transfer Learning, Unsupervised Domain Adaptation |
Published | 2017-11-28 |
URL | http://arxiv.org/abs/1711.10125v3 |
http://arxiv.org/pdf/1711.10125v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-cluster-in-order-to-transfer |
Repo | https://github.com/GT-RIPL/L2C |
Framework | pytorch |
Paying Attention to Descriptions Generated by Image Captioning Models
Title | Paying Attention to Descriptions Generated by Image Captioning Models |
Authors | Hamed R. Tavakoli, Rakshith Shetty, Ali Borji, Jorma Laaksonen |
Abstract | To bridge the gap between humans and machines in image understanding and describing, we need further insight into how people describe a perceived scene. In this paper, we study the agreement between bottom-up saliency-based visual attention and object referrals in scene description constructs. We investigate the properties of human-written descriptions and machine-generated ones. We then propose a saliency-boosted image captioning model in order to investigate benefits from low-level cues in language models. We learn that (1) humans mention more salient objects earlier than less salient ones in their descriptions, (2) the better a captioning model performs, the better attention agreement it has with human descriptions, (3) the proposed saliency-boosted model, compared to its baseline form, does not improve significantly on the MS COCO database, indicating explicit bottom-up boosting does not help when the task is well learnt and tuned on a data, (4) a better generalization is, however, observed for the saliency-boosted model on unseen data. |
Tasks | Image Captioning |
Published | 2017-04-24 |
URL | http://arxiv.org/abs/1704.07434v3 |
http://arxiv.org/pdf/1704.07434v3.pdf | |
PWC | https://paperswithcode.com/paper/paying-attention-to-descriptions-generated-by |
Repo | https://github.com/rakshithShetty/captionGAN |
Framework | none |
FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence
Title | FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence |
Authors | Seungryong Kim, Dongbo Min, Bumsub Ham, Sangryul Jeon, Stephen Lin, Kwanghoon Sohn |
Abstract | We present a descriptor, called fully convolutional self-similarity (FCSS), for dense semantic correspondence. To robustly match points among different instances within the same object class, we formulate FCSS using local self-similarity (LSS) within a fully convolutional network. In contrast to existing CNN-based descriptors, FCSS is inherently insensitive to intra-class appearance variations because of its LSS-based structure, while maintaining the precise localization ability of deep neural networks. The sampling patterns of local structure and the self-similarity measure are jointly learned within the proposed network in an end-to-end and multi-scale manner. As training data for semantic correspondence is rather limited, we propose to leverage object candidate priors provided in existing image datasets and also correspondence consistency between object pairs to enable weakly-supervised learning. Experiments demonstrate that FCSS outperforms conventional handcrafted descriptors and CNN-based descriptors on various benchmarks. |
Tasks | |
Published | 2017-02-03 |
URL | http://arxiv.org/abs/1702.00926v1 |
http://arxiv.org/pdf/1702.00926v1.pdf | |
PWC | https://paperswithcode.com/paper/fcss-fully-convolutional-self-similarity-for |
Repo | https://github.com/seungryong/FCSS |
Framework | none |
Extractive Summarization using Deep Learning
Title | Extractive Summarization using Deep Learning |
Authors | Sukriti Verma, Vagisha Nidhi |
Abstract | This paper proposes a text summarization approach for factual reports using a deep learning model. This approach consists of three phases: feature extraction, feature enhancement, and summary generation, which work together to assimilate core information and generate a coherent, understandable summary. We are exploring various features to improve the set of sentences selected for the summary, and are using a Restricted Boltzmann Machine to enhance and abstract those features to improve resultant accuracy without losing any important information. The sentences are scored based on those enhanced features and an extractive summary is constructed. Experimentation carried out on several articles demonstrates the effectiveness of the proposed approach. Source code available at: https://github.com/vagisha-nidhi/TextSummarizer |
Tasks | Text Summarization |
Published | 2017-08-15 |
URL | http://arxiv.org/abs/1708.04439v2 |
http://arxiv.org/pdf/1708.04439v2.pdf | |
PWC | https://paperswithcode.com/paper/extractive-summarization-using-deep-learning |
Repo | https://github.com/vagisha-nidhi/TextSummarizer |
Framework | none |
Automatically Annotated Turkish Corpus for Named Entity Recognition and Text Categorization using Large-Scale Gazetteers
Title | Automatically Annotated Turkish Corpus for Named Entity Recognition and Text Categorization using Large-Scale Gazetteers |
Authors | H. Bahadir Sahin, Caglar Tirkaz, Eray Yildiz, Mustafa Tolga Eren, Ozan Sonmez |
Abstract | Turkish Wikipedia Named-Entity Recognition and Text Categorization (TWNERTC) dataset is a collection of automatically categorized and annotated sentences obtained from Wikipedia. We constructed large-scale gazetteers by using a graph crawler algorithm to extract relevant entity and domain information from a semantic knowledge base, Freebase. The constructed gazetteers contains approximately 300K entities with thousands of fine-grained entity types under 77 different domains. Since automated processes are prone to ambiguity, we also introduce two new content specific noise reduction methodologies. Moreover, we map fine-grained entity types to the equivalent four coarse-grained types: person, loc, org, misc. Eventually, we construct six different dataset versions and evaluate the quality of annotations by comparing ground truths from human annotators. We make these datasets publicly available to support studies on Turkish named-entity recognition (NER) and text categorization (TC). |
Tasks | Named Entity Recognition, Text Categorization |
Published | 2017-02-08 |
URL | http://arxiv.org/abs/1702.02363v2 |
http://arxiv.org/pdf/1702.02363v2.pdf | |
PWC | https://paperswithcode.com/paper/automatically-annotated-turkish-corpus-for |
Repo | https://github.com/juand-r/entity-recognition-datasets |
Framework | torch |
Pyndri: a Python Interface to the Indri Search Engine
Title | Pyndri: a Python Interface to the Indri Search Engine |
Authors | Christophe Van Gysel, Evangelos Kanoulas, Maarten de Rijke |
Abstract | We introduce pyndri, a Python interface to the Indri search engine. Pyndri allows to access Indri indexes from Python at two levels: (1) dictionary and tokenized document collection, (2) evaluating queries on the index. We hope that with the release of pyndri, we will stimulate reproducible, open and fast-paced IR research. |
Tasks | |
Published | 2017-01-03 |
URL | http://arxiv.org/abs/1701.00749v1 |
http://arxiv.org/pdf/1701.00749v1.pdf | |
PWC | https://paperswithcode.com/paper/pyndri-a-python-interface-to-the-indri-search |
Repo | https://github.com/cvangysel/pyndri |
Framework | none |
SSD-6D: Making RGB-based 3D detection and 6D pose estimation great again
Title | SSD-6D: Making RGB-based 3D detection and 6D pose estimation great again |
Authors | Wadim Kehl, Fabian Manhardt, Federico Tombari, Slobodan Ilic, Nassir Navab |
Abstract | We present a novel method for detecting 3D model instances and estimating their 6D poses from RGB data in a single shot. To this end, we extend the popular SSD paradigm to cover the full 6D pose space and train on synthetic model data only. Our approach competes or surpasses current state-of-the-art methods that leverage RGB-D data on multiple challenging datasets. Furthermore, our method produces these results at around 10Hz, which is many times faster than the related methods. For the sake of reproducibility, we make our trained networks and detection code publicly available. |
Tasks | 6D Pose Estimation, 6D Pose Estimation using RGB, Pose Estimation |
Published | 2017-11-27 |
URL | http://arxiv.org/abs/1711.10006v1 |
http://arxiv.org/pdf/1711.10006v1.pdf | |
PWC | https://paperswithcode.com/paper/ssd-6d-making-rgb-based-3d-detection-and-6d |
Repo | https://github.com/wadimkehl/ssd-6d |
Framework | tf |
Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes
Title | Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes |
Authors | Aaron Sidford, Mengdi Wang, Xian Wu, Yinyu Ye |
Abstract | In this paper we provide faster algorithms for approximately solving discounted Markov Decision Processes in multiple parameter regimes. Given a discounted Markov Decision Process (DMDP) with $S$ states, $A$ actions, discount factor $\gamma\in(0,1)$, and rewards in the range $[-M, M]$, we show how to compute an $\epsilon$-optimal policy, with probability $1 - \delta$ in time [ \tilde{O}\left( \left(S^2 A + \frac{S A}{(1 - \gamma)^3} \right) \log\left( \frac{M}{\epsilon} \right) \log\left( \frac{1}{\delta} \right) \right) ~ . ] This contribution reflects the first nearly linear time, nearly linearly convergent algorithm for solving DMDPs for intermediate values of $\gamma$. We also show how to obtain improved sublinear time algorithms provided we can sample from the transition function in $O(1)$ time. Under this assumption we provide an algorithm which computes an $\epsilon$-optimal policy with probability $1 - \delta$ in time [ \tilde{O} \left(\frac{S A M^2}{(1 - \gamma)^4 \epsilon^2} \log \left(\frac{1}{\delta}\right) \right) ~. ] Lastly, we extend both these algorithms to solve finite horizon MDPs. Our algorithms improve upon the previous best for approximately computing optimal policies for fixed-horizon MDPs in multiple parameter regimes. Interestingly, we obtain our results by a careful modification of approximate value iteration. We show how to combine classic approximate value iteration analysis with new techniques in variance reduction. Our fastest algorithms leverage further insights to ensure that our algorithms make monotonic progress towards the optimal value. This paper is one of few instances in using sampling to obtain a linearly convergent linear programming algorithm and we hope that the analysis may be useful more broadly. |
Tasks | |
Published | 2017-10-27 |
URL | http://arxiv.org/abs/1710.09988v2 |
http://arxiv.org/pdf/1710.09988v2.pdf | |
PWC | https://paperswithcode.com/paper/variance-reduced-value-iteration-and-faster |
Repo | https://github.com/uclaopt/AsyncQVI |
Framework | none |