Paper Group AWR 401
Fully Automatic Liver Attenuation Estimation Combing CNN Segmentation and Morphological Operations. Very Long Natural Scenery Image Prediction by Outpainting. Inspirational Adversarial Image Generation. Attention Guided Network for Retinal Image Segmentation. Symmetry Detection and Classification in Drawings of Graphs. A Better Way to Attend: Atten …
Fully Automatic Liver Attenuation Estimation Combing CNN Segmentation and Morphological Operations
Title | Fully Automatic Liver Attenuation Estimation Combing CNN Segmentation and Morphological Operations |
Authors | Yuankai Huo, James G. Terry, Jiachen Wang, Sangeeta Nair, Thomas A. Lasko, Barry I. Freedman, J. Jeffery Carr, Bennett A. Landman |
Abstract | Manually tracing regions of interest (ROIs) within the liver is the de facto standard method for measuring liver attenuation on computed tomography (CT) in diagnosing nonalcoholic fatty liver disease (NAFLD). However, manual tracing is resource intensive. To address these limitations and to expand the availability of a quantitative CT measure of hepatic steatosis, we propose the automatic liver attenuation ROI-based measurement (ALARM) method for automated liver attenuation estimation. The ALARM method consists of two major stages: (1) deep convolutional neural network (DCNN)-based liver segmentation and (2) automated ROI extraction. First, liver segmentation was achieved using our previously developed SS-Net. Then, a single central ROI (center-ROI) and three circles ROI (periphery-ROI) were computed based on liver segmentation and morphological operations. The ALARM method is available as an open source Docker container (https://github.com/MASILab/ALARM).246 subjects with 738 abdomen CT scans from the African American-Diabetes Heart Study (AA-DHS) were used for external validation (testing), independent from the training and validation cohort (100 clinically acquired CT abdominal scans). |
Tasks | Computed Tomography (CT), Liver Segmentation |
Published | 2019-06-23 |
URL | https://arxiv.org/abs/1906.09549v2 |
https://arxiv.org/pdf/1906.09549v2.pdf | |
PWC | https://paperswithcode.com/paper/fully-automatic-liver-attenuation-estimation |
Repo | https://github.com/MASILab/ALARM |
Framework | none |
Very Long Natural Scenery Image Prediction by Outpainting
Title | Very Long Natural Scenery Image Prediction by Outpainting |
Authors | Zongxin Yang, Jian Dong, Ping Liu, Yi Yang, Shuicheng Yan |
Abstract | Comparing to image inpainting, image outpainting receives less attention due to two challenges in it. The first challenge is how to keep the spatial and content consistency between generated images and original input. The second challenge is how to maintain high quality in generated results, especially for multi-step generations in which generated regions are spatially far away from the initial input. To solve the two problems, we devise some innovative modules, named Skip Horizontal Connection and Recurrent Content Transfer, and integrate them into our designed encoder-decoder structure. By this design, our network can generate highly realistic outpainting prediction effectively and efficiently. Other than that, our method can generate new images with very long sizes while keeping the same style and semantic content as the given input. To test the effectiveness of the proposed architecture, we collect a new scenery dataset with diverse, complicated natural scenes. The experimental results on this dataset have demonstrated the efficacy of our proposed network. The code and dataset are available from https://github.com/z-x-yang/NS-Outpainting. |
Tasks | Image Inpainting, Image Outpainting |
Published | 2019-12-29 |
URL | https://arxiv.org/abs/1912.12688v1 |
https://arxiv.org/pdf/1912.12688v1.pdf | |
PWC | https://paperswithcode.com/paper/very-long-natural-scenery-image-prediction-by-1 |
Repo | https://github.com/z-x-yang/NS-Outpainting |
Framework | tf |
Inspirational Adversarial Image Generation
Title | Inspirational Adversarial Image Generation |
Authors | Morgane Riviere, Olivier Teytaud, Jérémy Rapin, Yann LeCun, Camille Couprie |
Abstract | The task of image generation started to receive some attention from artists and designers to inspire them in new creations. However, exploiting the results of deep generative models such as Generative Adversarial Networks can be long and tedious given the lack of existing tools. In this work, we propose a simple strategy to inspire creators with new generations learned from a dataset of their choice, while providing some control on them. We design a simple optimization method to find the optimal latent parameters corresponding to the closest generation to any input inspirational image. Specifically, we allow the generation given an inspirational image of the user choice by performing several optimization steps to recover optimal parameters from the model’s latent space. We tested several exploration methods starting with classic gradient descents to gradient-free optimizers. Many gradient-free optimizers just need comparisons (better/worse than another image), so that they can even be used without numerical criterion, without inspirational image, but with only with human preference. Thus, by iterating on one’s preferences we could make robust Facial Composite or Fashion Generation algorithms. High resolution of the produced design generations are obtained using progressive growing of GANs. Our results on four datasets of faces, fashion images, and textures show that satisfactory images are effectively retrieved in most cases. |
Tasks | Image Generation |
Published | 2019-06-17 |
URL | https://arxiv.org/abs/1906.11661v1 |
https://arxiv.org/pdf/1906.11661v1.pdf | |
PWC | https://paperswithcode.com/paper/inspirational-adversarial-image-generation |
Repo | https://github.com/facebookresearch/pytorch_GAN_zoo |
Framework | pytorch |
Attention Guided Network for Retinal Image Segmentation
Title | Attention Guided Network for Retinal Image Segmentation |
Authors | Shihao Zhang, Huazhu Fu, Yuguang Yan, Yubing Zhang, Qingyao Wu, Ming Yang, Mingkui Tan, Yanwu Xu |
Abstract | Learning structural information is critical for producing an ideal result in retinal image segmentation. Recently, convolutional neural networks have shown a powerful ability to extract effective representations. However, convolutional and pooling operations filter out some useful structural information. In this paper, we propose an Attention Guided Network (AG-Net) to preserve the structural information and guide the expanding operation. In our AG-Net, the guided filter is exploited as a structure sensitive expanding path to transfer structural information from previous feature maps, and an attention block is introduced to exclude the noise and reduce the negative influence of background further. The extensive experiments on two retinal image segmentation tasks (i.e., blood vessel segmentation, optic disc and cup segmentation) demonstrate the effectiveness of our proposed method. |
Tasks | Semantic Segmentation |
Published | 2019-07-25 |
URL | https://arxiv.org/abs/1907.12930v3 |
https://arxiv.org/pdf/1907.12930v3.pdf | |
PWC | https://paperswithcode.com/paper/attention-guided-network-for-retinal-image |
Repo | https://github.com/HzFu/AGNet |
Framework | pytorch |
Symmetry Detection and Classification in Drawings of Graphs
Title | Symmetry Detection and Classification in Drawings of Graphs |
Authors | Felice De Luca, Md Iqbal Hossain, Stephen Kobourov |
Abstract | Symmetry is a key feature observed in nature (from flowers and leaves, to butterflies and birds) and in human-made objects (from paintings and sculptures, to manufactured objects and architectural design). Rotational, translational, and especially reflectional symmetries, are also important in drawings of graphs. Detecting and classifying symmetries can be very useful in algorithms that aim to create symmetric graph drawings and in this paper we present a machine learning approach for these tasks. Specifically, we show that deep neural networks can be used to detect reflectional symmetries with 92% accuracy. We also build a multi-class classifier to distinguish between reflectional horizontal, reflectional vertical, rotational, and translational symmetries. Finally, we make available a collection of images of graph drawings with specific symmetric features that can be used in machine learning systems for training, testing and validation purposes. Our datasets, best trained ML models, source code are available online. |
Tasks | |
Published | 2019-07-01 |
URL | https://arxiv.org/abs/1907.01004v3 |
https://arxiv.org/pdf/1907.01004v3.pdf | |
PWC | https://paperswithcode.com/paper/symmetry-detection-and-classification-in |
Repo | https://github.com/enggiqbal/mlsymmetric |
Framework | none |
A Better Way to Attend: Attention with Trees for Video Question Answering
Title | A Better Way to Attend: Attention with Trees for Video Question Answering |
Authors | Hongyang Xue, Wenqing Chu, Zhou Zhao, Deng Cai |
Abstract | We propose a new attention model for video question answering. The main idea of the attention models is to locate on the most informative parts of the visual data. The attention mechanisms are quite popular these days. However, most existing visual attention mechanisms regard the question as a whole. They ignore the word-level semantics where each word can have different attentions and some words need no attention. Neither do they consider the semantic structure of the sentences. Although the Extended Soft Attention (E-SA) model for video question answering leverages the word-level attention, it performs poorly on long question sentences. In this paper, we propose the heterogeneous tree-structured memory network (HTreeMN) for video question answering. Our proposed approach is based upon the syntax parse trees of the question sentences. The HTreeMN treats the words differently where the \textit{visual} words are processed with an attention module and the \textit{verbal} ones not. It also utilizes the semantic structure of the sentences by combining the neighbors based on the recursive structure of the parse trees. The understandings of the words and the videos are propagated and merged from leaves to the root. Furthermore, we build a hierarchical attention mechanism to distill the attended features. We evaluate our approach on two datasets. The experimental results show the superiority of our HTreeMN model over the other attention models especially on complex questions. Our code is available on github. Our code is available at https://github.com/ZJULearning/TreeAttention |
Tasks | Question Answering, Video Question Answering |
Published | 2019-09-05 |
URL | https://arxiv.org/abs/1909.02218v1 |
https://arxiv.org/pdf/1909.02218v1.pdf | |
PWC | https://paperswithcode.com/paper/a-better-way-to-attend-attention-with-trees |
Repo | https://github.com/xuehy/TreeAttention |
Framework | pytorch |
Attribute Guided Unpaired Image-to-Image Translation with Semi-supervised Learning
Title | Attribute Guided Unpaired Image-to-Image Translation with Semi-supervised Learning |
Authors | Xinyang Li, Jie Hu, Shengchuan Zhang, Xiaopeng Hong, Qixiang Ye, Chenglin Wu, Rongrong Ji |
Abstract | Unpaired Image-to-Image Translation (UIT) focuses on translating images among different domains by using unpaired data, which has received increasing research focus due to its practical usage. However, existing UIT schemes defect in the need of supervised training, as well as the lack of encoding domain information. In this paper, we propose an Attribute Guided UIT model termed AGUIT to tackle these two challenges. AGUIT considers multi-modal and multi-domain tasks of UIT jointly with a novel semi-supervised setting, which also merits in representation disentanglement and fine control of outputs. Especially, AGUIT benefits from two-fold: (1) It adopts a novel semi-supervised learning process by translating attributes of labeled data to unlabeled data, and then reconstructing the unlabeled data by a cycle consistency operation. (2) It decomposes image representation into domain-invariant content code and domain-specific style code. The redesigned style code embeds image style into two variables drawn from standard Gaussian distribution and the distribution of domain label, which facilitates the fine control of translation due to the continuity of both variables. Finally, we introduce a new challenge, i.e., disentangled transfer, for UIT models, which adopts the disentangled representation to translate data less related with the training set. Extensive experiments demonstrate the capacity of AGUIT over existing state-of-the-art models. |
Tasks | Image-to-Image Translation |
Published | 2019-04-29 |
URL | http://arxiv.org/abs/1904.12428v1 |
http://arxiv.org/pdf/1904.12428v1.pdf | |
PWC | https://paperswithcode.com/paper/attribute-guided-unpaired-image-to-image |
Repo | https://github.com/imlixinyang/AGUIT |
Framework | pytorch |
SDIT: Scalable and Diverse Cross-domain Image Translation
Title | SDIT: Scalable and Diverse Cross-domain Image Translation |
Authors | Yaxing Wang, Abel Gonzalez-Garcia, Joost van de Weijer, Luis Herranz |
Abstract | Recently, image-to-image translation research has witnessed remarkable progress. Although current approaches successfully generate diverse outputs or perform scalable image transfer, these properties have not been combined into a single method. To address this limitation, we propose SDIT: Scalable and Diverse image-to-image translation. These properties are combined into a single generator. The diversity is determined by a latent variable which is randomly sampled from a normal distribution. The scalability is obtained by conditioning the network on the domain attributes. Additionally, we also exploit an attention mechanism that permits the generator to focus on the domain-specific attribute. We empirically demonstrate the performance of the proposed method on face mapping and other datasets beyond faces. |
Tasks | Image-to-Image Translation |
Published | 2019-08-19 |
URL | https://arxiv.org/abs/1908.06881v1 |
https://arxiv.org/pdf/1908.06881v1.pdf | |
PWC | https://paperswithcode.com/paper/sdit-scalable-and-diverse-cross-domain-image |
Repo | https://github.com/taki0112/SDIT-Tensorflow |
Framework | tf |
Seeing without Looking: Contextual Rescoring of Object Detections for AP Maximization
Title | Seeing without Looking: Contextual Rescoring of Object Detections for AP Maximization |
Authors | Lourenço V. Pato, Renato Negrinho, Pedro M. Q. Aguiar |
Abstract | The majority of current object detectors lack context: class predictions are made independently from other detections. We propose to incorporate context in object detection by post-processing the output of an arbitrary detector to rescore the confidences of its detections. Rescoring is done by conditioning on contextual information from the entire set of detections: their confidences, predicted classes, and positions. We show that AP can be improved by simply reassigning the detection confidence values such that true positives that survive longer (i.e., those with the correct class and large IoU) are scored higher than false positives or detections with small IoU. In this setting, we use a bidirectional RNN with attention for contextual rescoring and introduce a training target that uses the IoU with ground truth to maximize AP for the given set of detections. The fact that our approach does not require access to visual features makes it computationally inexpensive and agnostic to the detection architecture. In spite of this simplicity, our model consistently improves AP over strong pre-trained baselines (Cascade R-CNN and Faster R-CNN with several backbones), particularly by reducing the confidence of duplicate detections (a learned form of non-maximum suppression) and removing out-of-context objects by conditioning on the confidences, classes, positions, and sizes of the co-occurrent detections. Code is available at https://github.com/LourencoVazPato/seeing-without-looking/ |
Tasks | Object Detection |
Published | 2019-12-27 |
URL | https://arxiv.org/abs/1912.12290v2 |
https://arxiv.org/pdf/1912.12290v2.pdf | |
PWC | https://paperswithcode.com/paper/seeing-without-looking-contextual-rescoring |
Repo | https://github.com/LourencoVazPato/seeing-without-looking |
Framework | pytorch |
Triplet Distillation for Deep Face Recognition
Title | Triplet Distillation for Deep Face Recognition |
Authors | Yushu Feng, Huan Wang, Daniel T. Yi, Roland Hu |
Abstract | Convolutional neural networks (CNNs) have achieved a great success in face recognition, which unfortunately comes at the cost of massive computation and storage consumption. Many compact face recognition networks are thus proposed to resolve this problem. Triplet loss is effective to further improve the performance of those compact models. However, it normally employs a fixed margin to all the samples, which neglects the informative similarity structures between different identities. In this paper, we propose an enhanced version of triplet loss, named triplet distillation, which exploits the capability of a teacher model to transfer the similarity information to a small model by adaptively varying the margin between positive and negative pairs. Experiments on LFW, AgeDB, and CPLFW datasets show the merits of our method compared to the original triplet loss. |
Tasks | Face Recognition |
Published | 2019-05-11 |
URL | https://arxiv.org/abs/1905.04457v2 |
https://arxiv.org/pdf/1905.04457v2.pdf | |
PWC | https://paperswithcode.com/paper/triplet-distillation-for-deep-face |
Repo | https://github.com/david-svitov/margindistillation |
Framework | none |
NoRML: No-Reward Meta Learning
Title | NoRML: No-Reward Meta Learning |
Authors | Yuxiang Yang, Ken Caluwaerts, Atil Iscen, Jie Tan, Chelsea Finn |
Abstract | Efficiently adapting to new environments and changes in dynamics is critical for agents to successfully operate in the real world. Reinforcement learning (RL) based approaches typically rely on external reward feedback for adaptation. However, in many scenarios this reward signal might not be readily available for the target task, or the difference between the environments can be implicit and only observable from the dynamics. To this end, we introduce a method that allows for self-adaptation of learned policies: No-Reward Meta Learning (NoRML). NoRML extends Model Agnostic Meta Learning (MAML) for RL and uses observable dynamics of the environment instead of an explicit reward function in MAML’s finetune step. Our method has a more expressive update step than MAML, while maintaining MAML’s gradient based foundation. Additionally, in order to allow more targeted exploration, we implement an extension to MAML that effectively disconnects the meta-policy parameters from the fine-tuned policies’ parameters. We first study our method on a number of synthetic control problems and then validate our method on common benchmark environments, showing that NoRML outperforms MAML when the dynamics change between tasks. |
Tasks | Meta-Learning |
Published | 2019-03-04 |
URL | http://arxiv.org/abs/1903.01063v1 |
http://arxiv.org/pdf/1903.01063v1.pdf | |
PWC | https://paperswithcode.com/paper/norml-no-reward-meta-learning |
Repo | https://github.com/google-research/google-research |
Framework | tf |
Three-dimensional Backbone Network for 3D Object Detection in Traffic Scenes
Title | Three-dimensional Backbone Network for 3D Object Detection in Traffic Scenes |
Authors | Xuesong Li, Jose Guivant, Ngaiming Kwok, Yongzhi Xu, Ruowei Li, Hongkun Wu |
Abstract | The task of detecting 3D objects in traffic scenes has a pivotal role in many real-world applications. However, the performance of 3D object detection is lower than that of 2D object detection due to the lack of powerful 3D feature extraction methods. To address this issue, this study proposes a 3D backbone network to acquire comprehensive 3D feature maps for 3D object detection. It primarily consists of sparse 3D convolutional neural network operations in the point cloud. The 3D backbone network can inherently learn 3D features from the raw data without compressing the point cloud into multiple 2D images. The sparse 3D convolutional neural network takes full advantage of the sparsity in the 3D point cloud to accelerate computation and save memory, which makes the 3D backbone network feasible in a real-world application. Empirical experiments were conducted on the KITTI benchmark and comparable results were obtained with respect to the state-of-the-art performance for 3D object detection. |
Tasks | 3D Object Detection, Object Detection |
Published | 2019-01-24 |
URL | https://arxiv.org/abs/1901.08373v2 |
https://arxiv.org/pdf/1901.08373v2.pdf | |
PWC | https://paperswithcode.com/paper/3d-backbone-network-for-3d-object-detection |
Repo | https://github.com/Benzlxs/tDBN |
Framework | pytorch |
Inherent Weight Normalization in Stochastic Neural Networks
Title | Inherent Weight Normalization in Stochastic Neural Networks |
Authors | Georgios Detorakis, Sourav Dutta, Abhishek Khanna, Matthew Jerry, Suman Datta, Emre Neftci |
Abstract | Multiplicative stochasticity such as Dropout improves the robustness and generalizability of deep neural networks. Here, we further demonstrate that always-on multiplicative stochasticity combined with simple threshold neurons are sufficient operations for deep neural networks. We call such models Neural Sampling Machines (NSM). We find that the probability of activation of the NSM exhibits a self-normalizing property that mirrors Weight Normalization, a previously studied mechanism that fulfills many of the features of Batch Normalization in an online fashion. The normalization of activities during training speeds up convergence by preventing internal covariate shift caused by changes in the input distribution. The always-on stochasticity of the NSM confers the following advantages: the network is identical in the inference and learning phases, making the NSM suitable for online learning, it can exploit stochasticity inherent to a physical substrate such as analog non-volatile memories for in-memory computing, and it is suitable for Monte Carlo sampling, while requiring almost exclusively addition and comparison operations. We demonstrate NSMs on standard classification benchmarks (MNIST and CIFAR) and event-based classification benchmarks (N-MNIST and DVS Gestures). Our results show that NSMs perform comparably or better than conventional artificial neural networks with the same architecture. |
Tasks | |
Published | 2019-10-27 |
URL | https://arxiv.org/abs/1910.12316v1 |
https://arxiv.org/pdf/1910.12316v1.pdf | |
PWC | https://paperswithcode.com/paper/inherent-weight-normalization-in-stochastic |
Repo | https://github.com/nmi-lab/neural_sampling_machines |
Framework | pytorch |
A Stable Variational Autoencoder for Text Modelling
Title | A Stable Variational Autoencoder for Text Modelling |
Authors | Ruizhe Li, Xiao Li, Chenghua Lin, Matthew Collinson, Rui Mao |
Abstract | Variational Autoencoder (VAE) is a powerful method for learning representations of high-dimensional data. However, VAEs can suffer from an issue known as latent variable collapse (or KL loss vanishing), where the posterior collapses to the prior and the model will ignore the latent codes in generative tasks. Such an issue is particularly prevalent when employing VAE-RNN architectures for text modelling (Bowman et al., 2016). In this paper, we present a simple architecture called holistic regularisation VAE (HR-VAE), which can effectively avoid latent variable collapse. Compared to existing VAE-RNN architectures, we show that our model can achieve much more stable training process and can generate text with significantly better quality. |
Tasks | |
Published | 2019-11-13 |
URL | https://arxiv.org/abs/1911.05343v1 |
https://arxiv.org/pdf/1911.05343v1.pdf | |
PWC | https://paperswithcode.com/paper/a-stable-variational-autoencoder-for-text |
Repo | https://github.com/ruizheliUOA/HR-VAE |
Framework | pytorch |
Zero-Resource Cross-Lingual Named Entity Recognition
Title | Zero-Resource Cross-Lingual Named Entity Recognition |
Authors | M Saiful Bari, Shafiq Joty, Prathyusha Jwalapuram |
Abstract | Recently, neural methods have achieved state-of-the-art (SOTA) results in Named Entity Recognition (NER) tasks for many languages without the need for manually crafted features. However, these models still require manually annotated training data, which is not available for many languages. In this paper, we propose an unsupervised cross-lingual NER model that can transfer NER knowledge from one language to another in a completely unsupervised way without relying on any bilingual dictionary or parallel data. Our model achieves this through word-level adversarial learning and augmented fine-tuning with parameter sharing and feature augmentation. Experiments on five different languages demonstrate the effectiveness of our approach, outperforming existing models by a good margin and setting a new SOTA for each language pair. |
Tasks | Named Entity Recognition |
Published | 2019-11-22 |
URL | https://arxiv.org/abs/1911.09812v1 |
https://arxiv.org/pdf/1911.09812v1.pdf | |
PWC | https://paperswithcode.com/paper/zero-resource-cross-lingual-named-entity |
Repo | https://github.com/ntunlp/Zero-Shot-Cross-Lingual-NER |
Framework | none |