January 31, 2020

3186 words 15 mins read

Paper Group AWR 401

Fully Automatic Liver Attenuation Estimation Combing CNN Segmentation and Morphological Operations. Very Long Natural Scenery Image Prediction by Outpainting. Inspirational Adversarial Image Generation. Attention Guided Network for Retinal Image Segmentation. Symmetry Detection and Classification in Drawings of Graphs. A Better Way to Attend: Atten …

Fully Automatic Liver Attenuation Estimation Combing CNN Segmentation and Morphological Operations


Title	Fully Automatic Liver Attenuation Estimation Combing CNN Segmentation and Morphological Operations
Authors	Yuankai Huo, James G. Terry, Jiachen Wang, Sangeeta Nair, Thomas A. Lasko, Barry I. Freedman, J. Jeffery Carr, Bennett A. Landman
Abstract	Manually tracing regions of interest (ROIs) within the liver is the de facto standard method for measuring liver attenuation on computed tomography (CT) in diagnosing nonalcoholic fatty liver disease (NAFLD). However, manual tracing is resource intensive. To address these limitations and to expand the availability of a quantitative CT measure of hepatic steatosis, we propose the automatic liver attenuation ROI-based measurement (ALARM) method for automated liver attenuation estimation. The ALARM method consists of two major stages: (1) deep convolutional neural network (DCNN)-based liver segmentation and (2) automated ROI extraction. First, liver segmentation was achieved using our previously developed SS-Net. Then, a single central ROI (center-ROI) and three circles ROI (periphery-ROI) were computed based on liver segmentation and morphological operations. The ALARM method is available as an open source Docker container (https://github.com/MASILab/ALARM).246 subjects with 738 abdomen CT scans from the African American-Diabetes Heart Study (AA-DHS) were used for external validation (testing), independent from the training and validation cohort (100 clinically acquired CT abdominal scans).
Tasks	Computed Tomography (CT), Liver Segmentation
Published	2019-06-23
URL	https://arxiv.org/abs/1906.09549v2
PDF	https://arxiv.org/pdf/1906.09549v2.pdf
PWC	https://paperswithcode.com/paper/fully-automatic-liver-attenuation-estimation
Repo	https://github.com/MASILab/ALARM
Framework	none

Very Long Natural Scenery Image Prediction by Outpainting


Title	Very Long Natural Scenery Image Prediction by Outpainting
Authors	Zongxin Yang, Jian Dong, Ping Liu, Yi Yang, Shuicheng Yan
Abstract	Comparing to image inpainting, image outpainting receives less attention due to two challenges in it. The first challenge is how to keep the spatial and content consistency between generated images and original input. The second challenge is how to maintain high quality in generated results, especially for multi-step generations in which generated regions are spatially far away from the initial input. To solve the two problems, we devise some innovative modules, named Skip Horizontal Connection and Recurrent Content Transfer, and integrate them into our designed encoder-decoder structure. By this design, our network can generate highly realistic outpainting prediction effectively and efficiently. Other than that, our method can generate new images with very long sizes while keeping the same style and semantic content as the given input. To test the effectiveness of the proposed architecture, we collect a new scenery dataset with diverse, complicated natural scenes. The experimental results on this dataset have demonstrated the efficacy of our proposed network. The code and dataset are available from https://github.com/z-x-yang/NS-Outpainting.
Tasks	Image Inpainting, Image Outpainting
Published	2019-12-29
URL	https://arxiv.org/abs/1912.12688v1
PDF	https://arxiv.org/pdf/1912.12688v1.pdf
PWC	https://paperswithcode.com/paper/very-long-natural-scenery-image-prediction-by-1
Repo	https://github.com/z-x-yang/NS-Outpainting
Framework	tf

Inspirational Adversarial Image Generation


Title	Inspirational Adversarial Image Generation
Authors	Morgane Riviere, Olivier Teytaud, Jérémy Rapin, Yann LeCun, Camille Couprie
Abstract	The task of image generation started to receive some attention from artists and designers to inspire them in new creations. However, exploiting the results of deep generative models such as Generative Adversarial Networks can be long and tedious given the lack of existing tools. In this work, we propose a simple strategy to inspire creators with new generations learned from a dataset of their choice, while providing some control on them. We design a simple optimization method to find the optimal latent parameters corresponding to the closest generation to any input inspirational image. Specifically, we allow the generation given an inspirational image of the user choice by performing several optimization steps to recover optimal parameters from the model’s latent space. We tested several exploration methods starting with classic gradient descents to gradient-free optimizers. Many gradient-free optimizers just need comparisons (better/worse than another image), so that they can even be used without numerical criterion, without inspirational image, but with only with human preference. Thus, by iterating on one’s preferences we could make robust Facial Composite or Fashion Generation algorithms. High resolution of the produced design generations are obtained using progressive growing of GANs. Our results on four datasets of faces, fashion images, and textures show that satisfactory images are effectively retrieved in most cases.
Tasks	Image Generation
Published	2019-06-17
URL	https://arxiv.org/abs/1906.11661v1
PDF	https://arxiv.org/pdf/1906.11661v1.pdf
PWC	https://paperswithcode.com/paper/inspirational-adversarial-image-generation
Repo	https://github.com/facebookresearch/pytorch_GAN_zoo
Framework	pytorch

Attention Guided Network for Retinal Image Segmentation


Title	Attention Guided Network for Retinal Image Segmentation
Authors	Shihao Zhang, Huazhu Fu, Yuguang Yan, Yubing Zhang, Qingyao Wu, Ming Yang, Mingkui Tan, Yanwu Xu
Abstract	Learning structural information is critical for producing an ideal result in retinal image segmentation. Recently, convolutional neural networks have shown a powerful ability to extract effective representations. However, convolutional and pooling operations filter out some useful structural information. In this paper, we propose an Attention Guided Network (AG-Net) to preserve the structural information and guide the expanding operation. In our AG-Net, the guided filter is exploited as a structure sensitive expanding path to transfer structural information from previous feature maps, and an attention block is introduced to exclude the noise and reduce the negative influence of background further. The extensive experiments on two retinal image segmentation tasks (i.e., blood vessel segmentation, optic disc and cup segmentation) demonstrate the effectiveness of our proposed method.
Tasks	Semantic Segmentation
Published	2019-07-25
URL	https://arxiv.org/abs/1907.12930v3
PDF	https://arxiv.org/pdf/1907.12930v3.pdf
PWC	https://paperswithcode.com/paper/attention-guided-network-for-retinal-image
Repo	https://github.com/HzFu/AGNet
Framework	pytorch

Symmetry Detection and Classification in Drawings of Graphs


Title	Symmetry Detection and Classification in Drawings of Graphs
Authors	Felice De Luca, Md Iqbal Hossain, Stephen Kobourov
Abstract	Symmetry is a key feature observed in nature (from flowers and leaves, to butterflies and birds) and in human-made objects (from paintings and sculptures, to manufactured objects and architectural design). Rotational, translational, and especially reflectional symmetries, are also important in drawings of graphs. Detecting and classifying symmetries can be very useful in algorithms that aim to create symmetric graph drawings and in this paper we present a machine learning approach for these tasks. Specifically, we show that deep neural networks can be used to detect reflectional symmetries with 92% accuracy. We also build a multi-class classifier to distinguish between reflectional horizontal, reflectional vertical, rotational, and translational symmetries. Finally, we make available a collection of images of graph drawings with specific symmetric features that can be used in machine learning systems for training, testing and validation purposes. Our datasets, best trained ML models, source code are available online.
Tasks
Published	2019-07-01
URL	https://arxiv.org/abs/1907.01004v3
PDF	https://arxiv.org/pdf/1907.01004v3.pdf
PWC	https://paperswithcode.com/paper/symmetry-detection-and-classification-in
Repo	https://github.com/enggiqbal/mlsymmetric
Framework	none

A Better Way to Attend: Attention with Trees for Video Question Answering


Title	A Better Way to Attend: Attention with Trees for Video Question Answering
Authors	Hongyang Xue, Wenqing Chu, Zhou Zhao, Deng Cai
Abstract	We propose a new attention model for video question answering. The main idea of the attention models is to locate on the most informative parts of the visual data. The attention mechanisms are quite popular these days. However, most existing visual attention mechanisms regard the question as a whole. They ignore the word-level semantics where each word can have different attentions and some words need no attention. Neither do they consider the semantic structure of the sentences. Although the Extended Soft Attention (E-SA) model for video question answering leverages the word-level attention, it performs poorly on long question sentences. In this paper, we propose the heterogeneous tree-structured memory network (HTreeMN) for video question answering. Our proposed approach is based upon the syntax parse trees of the question sentences. The HTreeMN treats the words differently where the \textit{visual} words are processed with an attention module and the \textit{verbal} ones not. It also utilizes the semantic structure of the sentences by combining the neighbors based on the recursive structure of the parse trees. The understandings of the words and the videos are propagated and merged from leaves to the root. Furthermore, we build a hierarchical attention mechanism to distill the attended features. We evaluate our approach on two datasets. The experimental results show the superiority of our HTreeMN model over the other attention models especially on complex questions. Our code is available on github. Our code is available at https://github.com/ZJULearning/TreeAttention
Tasks	Question Answering, Video Question Answering
Published	2019-09-05
URL	https://arxiv.org/abs/1909.02218v1
PDF	https://arxiv.org/pdf/1909.02218v1.pdf
PWC	https://paperswithcode.com/paper/a-better-way-to-attend-attention-with-trees
Repo	https://github.com/xuehy/TreeAttention
Framework	pytorch

Attribute Guided Unpaired Image-to-Image Translation with Semi-supervised Learning


Title	Attribute Guided Unpaired Image-to-Image Translation with Semi-supervised Learning
Authors	Xinyang Li, Jie Hu, Shengchuan Zhang, Xiaopeng Hong, Qixiang Ye, Chenglin Wu, Rongrong Ji
Abstract	Unpaired Image-to-Image Translation (UIT) focuses on translating images among different domains by using unpaired data, which has received increasing research focus due to its practical usage. However, existing UIT schemes defect in the need of supervised training, as well as the lack of encoding domain information. In this paper, we propose an Attribute Guided UIT model termed AGUIT to tackle these two challenges. AGUIT considers multi-modal and multi-domain tasks of UIT jointly with a novel semi-supervised setting, which also merits in representation disentanglement and fine control of outputs. Especially, AGUIT benefits from two-fold: (1) It adopts a novel semi-supervised learning process by translating attributes of labeled data to unlabeled data, and then reconstructing the unlabeled data by a cycle consistency operation. (2) It decomposes image representation into domain-invariant content code and domain-specific style code. The redesigned style code embeds image style into two variables drawn from standard Gaussian distribution and the distribution of domain label, which facilitates the fine control of translation due to the continuity of both variables. Finally, we introduce a new challenge, i.e., disentangled transfer, for UIT models, which adopts the disentangled representation to translate data less related with the training set. Extensive experiments demonstrate the capacity of AGUIT over existing state-of-the-art models.
Tasks	Image-to-Image Translation
Published	2019-04-29
URL	http://arxiv.org/abs/1904.12428v1
PDF	http://arxiv.org/pdf/1904.12428v1.pdf
PWC	https://paperswithcode.com/paper/attribute-guided-unpaired-image-to-image
Repo	https://github.com/imlixinyang/AGUIT
Framework	pytorch

SDIT: Scalable and Diverse Cross-domain Image Translation


Title	SDIT: Scalable and Diverse Cross-domain Image Translation
Authors	Yaxing Wang, Abel Gonzalez-Garcia, Joost van de Weijer, Luis Herranz
Abstract	Recently, image-to-image translation research has witnessed remarkable progress. Although current approaches successfully generate diverse outputs or perform scalable image transfer, these properties have not been combined into a single method. To address this limitation, we propose SDIT: Scalable and Diverse image-to-image translation. These properties are combined into a single generator. The diversity is determined by a latent variable which is randomly sampled from a normal distribution. The scalability is obtained by conditioning the network on the domain attributes. Additionally, we also exploit an attention mechanism that permits the generator to focus on the domain-specific attribute. We empirically demonstrate the performance of the proposed method on face mapping and other datasets beyond faces.
Tasks	Image-to-Image Translation
Published	2019-08-19
URL	https://arxiv.org/abs/1908.06881v1
PDF	https://arxiv.org/pdf/1908.06881v1.pdf
PWC	https://paperswithcode.com/paper/sdit-scalable-and-diverse-cross-domain-image
Repo	https://github.com/taki0112/SDIT-Tensorflow
Framework	tf

Seeing without Looking: Contextual Rescoring of Object Detections for AP Maximization


Title	Seeing without Looking: Contextual Rescoring of Object Detections for AP Maximization
Authors	Lourenço V. Pato, Renato Negrinho, Pedro M. Q. Aguiar
Abstract	The majority of current object detectors lack context: class predictions are made independently from other detections. We propose to incorporate context in object detection by post-processing the output of an arbitrary detector to rescore the confidences of its detections. Rescoring is done by conditioning on contextual information from the entire set of detections: their confidences, predicted classes, and positions. We show that AP can be improved by simply reassigning the detection confidence values such that true positives that survive longer (i.e., those with the correct class and large IoU) are scored higher than false positives or detections with small IoU. In this setting, we use a bidirectional RNN with attention for contextual rescoring and introduce a training target that uses the IoU with ground truth to maximize AP for the given set of detections. The fact that our approach does not require access to visual features makes it computationally inexpensive and agnostic to the detection architecture. In spite of this simplicity, our model consistently improves AP over strong pre-trained baselines (Cascade R-CNN and Faster R-CNN with several backbones), particularly by reducing the confidence of duplicate detections (a learned form of non-maximum suppression) and removing out-of-context objects by conditioning on the confidences, classes, positions, and sizes of the co-occurrent detections. Code is available at https://github.com/LourencoVazPato/seeing-without-looking/
Tasks	Object Detection
Published	2019-12-27
URL	https://arxiv.org/abs/1912.12290v2
PDF	https://arxiv.org/pdf/1912.12290v2.pdf
PWC	https://paperswithcode.com/paper/seeing-without-looking-contextual-rescoring
Repo	https://github.com/LourencoVazPato/seeing-without-looking
Framework	pytorch

Triplet Distillation for Deep Face Recognition


Title	Triplet Distillation for Deep Face Recognition
Authors	Yushu Feng, Huan Wang, Daniel T. Yi, Roland Hu
Abstract	Convolutional neural networks (CNNs) have achieved a great success in face recognition, which unfortunately comes at the cost of massive computation and storage consumption. Many compact face recognition networks are thus proposed to resolve this problem. Triplet loss is effective to further improve the performance of those compact models. However, it normally employs a fixed margin to all the samples, which neglects the informative similarity structures between different identities. In this paper, we propose an enhanced version of triplet loss, named triplet distillation, which exploits the capability of a teacher model to transfer the similarity information to a small model by adaptively varying the margin between positive and negative pairs. Experiments on LFW, AgeDB, and CPLFW datasets show the merits of our method compared to the original triplet loss.
Tasks	Face Recognition
Published	2019-05-11
URL	https://arxiv.org/abs/1905.04457v2
PDF	https://arxiv.org/pdf/1905.04457v2.pdf
PWC	https://paperswithcode.com/paper/triplet-distillation-for-deep-face
Repo	https://github.com/david-svitov/margindistillation
Framework	none

NoRML: No-Reward Meta Learning


Title	NoRML: No-Reward Meta Learning
Authors	Yuxiang Yang, Ken Caluwaerts, Atil Iscen, Jie Tan, Chelsea Finn
Abstract	Efficiently adapting to new environments and changes in dynamics is critical for agents to successfully operate in the real world. Reinforcement learning (RL) based approaches typically rely on external reward feedback for adaptation. However, in many scenarios this reward signal might not be readily available for the target task, or the difference between the environments can be implicit and only observable from the dynamics. To this end, we introduce a method that allows for self-adaptation of learned policies: No-Reward Meta Learning (NoRML). NoRML extends Model Agnostic Meta Learning (MAML) for RL and uses observable dynamics of the environment instead of an explicit reward function in MAML’s finetune step. Our method has a more expressive update step than MAML, while maintaining MAML’s gradient based foundation. Additionally, in order to allow more targeted exploration, we implement an extension to MAML that effectively disconnects the meta-policy parameters from the fine-tuned policies’ parameters. We first study our method on a number of synthetic control problems and then validate our method on common benchmark environments, showing that NoRML outperforms MAML when the dynamics change between tasks.
Tasks	Meta-Learning
Published	2019-03-04
URL	http://arxiv.org/abs/1903.01063v1
PDF	http://arxiv.org/pdf/1903.01063v1.pdf
PWC	https://paperswithcode.com/paper/norml-no-reward-meta-learning
Repo	https://github.com/google-research/google-research
Framework	tf

Three-dimensional Backbone Network for 3D Object Detection in Traffic Scenes


Title	Three-dimensional Backbone Network for 3D Object Detection in Traffic Scenes
Authors	Xuesong Li, Jose Guivant, Ngaiming Kwok, Yongzhi Xu, Ruowei Li, Hongkun Wu
Abstract	The task of detecting 3D objects in traffic scenes has a pivotal role in many real-world applications. However, the performance of 3D object detection is lower than that of 2D object detection due to the lack of powerful 3D feature extraction methods. To address this issue, this study proposes a 3D backbone network to acquire comprehensive 3D feature maps for 3D object detection. It primarily consists of sparse 3D convolutional neural network operations in the point cloud. The 3D backbone network can inherently learn 3D features from the raw data without compressing the point cloud into multiple 2D images. The sparse 3D convolutional neural network takes full advantage of the sparsity in the 3D point cloud to accelerate computation and save memory, which makes the 3D backbone network feasible in a real-world application. Empirical experiments were conducted on the KITTI benchmark and comparable results were obtained with respect to the state-of-the-art performance for 3D object detection.
Tasks	3D Object Detection, Object Detection
Published	2019-01-24
URL	https://arxiv.org/abs/1901.08373v2
PDF	https://arxiv.org/pdf/1901.08373v2.pdf
PWC	https://paperswithcode.com/paper/3d-backbone-network-for-3d-object-detection
Repo	https://github.com/Benzlxs/tDBN
Framework	pytorch

Inherent Weight Normalization in Stochastic Neural Networks


Title	Inherent Weight Normalization in Stochastic Neural Networks
Authors	Georgios Detorakis, Sourav Dutta, Abhishek Khanna, Matthew Jerry, Suman Datta, Emre Neftci
Abstract	Multiplicative stochasticity such as Dropout improves the robustness and generalizability of deep neural networks. Here, we further demonstrate that always-on multiplicative stochasticity combined with simple threshold neurons are sufficient operations for deep neural networks. We call such models Neural Sampling Machines (NSM). We find that the probability of activation of the NSM exhibits a self-normalizing property that mirrors Weight Normalization, a previously studied mechanism that fulfills many of the features of Batch Normalization in an online fashion. The normalization of activities during training speeds up convergence by preventing internal covariate shift caused by changes in the input distribution. The always-on stochasticity of the NSM confers the following advantages: the network is identical in the inference and learning phases, making the NSM suitable for online learning, it can exploit stochasticity inherent to a physical substrate such as analog non-volatile memories for in-memory computing, and it is suitable for Monte Carlo sampling, while requiring almost exclusively addition and comparison operations. We demonstrate NSMs on standard classification benchmarks (MNIST and CIFAR) and event-based classification benchmarks (N-MNIST and DVS Gestures). Our results show that NSMs perform comparably or better than conventional artificial neural networks with the same architecture.
Tasks
Published	2019-10-27
URL	https://arxiv.org/abs/1910.12316v1
PDF	https://arxiv.org/pdf/1910.12316v1.pdf
PWC	https://paperswithcode.com/paper/inherent-weight-normalization-in-stochastic
Repo	https://github.com/nmi-lab/neural_sampling_machines
Framework	pytorch

A Stable Variational Autoencoder for Text Modelling


Title	A Stable Variational Autoencoder for Text Modelling
Authors	Ruizhe Li, Xiao Li, Chenghua Lin, Matthew Collinson, Rui Mao
Abstract	Variational Autoencoder (VAE) is a powerful method for learning representations of high-dimensional data. However, VAEs can suffer from an issue known as latent variable collapse (or KL loss vanishing), where the posterior collapses to the prior and the model will ignore the latent codes in generative tasks. Such an issue is particularly prevalent when employing VAE-RNN architectures for text modelling (Bowman et al., 2016). In this paper, we present a simple architecture called holistic regularisation VAE (HR-VAE), which can effectively avoid latent variable collapse. Compared to existing VAE-RNN architectures, we show that our model can achieve much more stable training process and can generate text with significantly better quality.
Tasks
Published	2019-11-13
URL	https://arxiv.org/abs/1911.05343v1
PDF	https://arxiv.org/pdf/1911.05343v1.pdf
PWC	https://paperswithcode.com/paper/a-stable-variational-autoencoder-for-text
Repo	https://github.com/ruizheliUOA/HR-VAE
Framework	pytorch

Zero-Resource Cross-Lingual Named Entity Recognition


Title	Zero-Resource Cross-Lingual Named Entity Recognition
Authors	M Saiful Bari, Shafiq Joty, Prathyusha Jwalapuram
Abstract	Recently, neural methods have achieved state-of-the-art (SOTA) results in Named Entity Recognition (NER) tasks for many languages without the need for manually crafted features. However, these models still require manually annotated training data, which is not available for many languages. In this paper, we propose an unsupervised cross-lingual NER model that can transfer NER knowledge from one language to another in a completely unsupervised way without relying on any bilingual dictionary or parallel data. Our model achieves this through word-level adversarial learning and augmented fine-tuning with parameter sharing and feature augmentation. Experiments on five different languages demonstrate the effectiveness of our approach, outperforming existing models by a good margin and setting a new SOTA for each language pair.
Tasks	Named Entity Recognition
Published	2019-11-22
URL	https://arxiv.org/abs/1911.09812v1
PDF	https://arxiv.org/pdf/1911.09812v1.pdf
PWC	https://paperswithcode.com/paper/zero-resource-cross-lingual-named-entity
Repo	https://github.com/ntunlp/Zero-Shot-Cross-Lingual-NER
Framework	none