April 2, 2020

3184 words 15 mins read

Paper Group ANR 89

Edge-Gated CNNs for Volumetric Semantic Segmentation of Medical Images. Deep Convolutional Neural Networks with Spatial Regularization, Volume and Star-shape Priori for Image Segmentation. Segmentação de imagens utilizando competição e cooperação entre partículas. Near-Optimal Hardware Design for Convolutional Neural Networks. Distributed Learning …

Edge-Gated CNNs for Volumetric Semantic Segmentation of Medical Images


Title	Edge-Gated CNNs for Volumetric Semantic Segmentation of Medical Images
Authors	Ali Hatamizadeh, Demetri Terzopoulos, Andriy Myronenko
Abstract	Textures and edges contribute different information to image recognition. Edges and boundaries encode shape information, while textures manifest the appearance of regions. Despite the success of Convolutional Neural Networks (CNNs) in computer vision and medical image analysis applications, predominantly only texture abstractions are learned, which often leads to imprecise boundary delineations. In medical imaging, expert manual segmentation often relies on organ boundaries; for example, to manually segment a liver, a medical practitioner usually identifies edges first and subsequently fills in the segmentation mask. Motivated by these observations, we propose a plug-and-play module, dubbed Edge-Gated CNNs (EG-CNNs), that can be used with existing encoder-decoder architectures to process both edge and texture information. The EG-CNN learns to emphasize the edges in the encoder, to predict crisp boundaries by an auxiliary edge supervision, and to fuse its output with the original CNN output. We evaluate the effectiveness of the EG-CNN with various mainstream CNNs on two publicly available datasets, BraTS 19 and KiTS 19 for brain tumor and kidney semantic segmentation. We demonstrate how the addition of EG-CNN consistently improves segmentation accuracy and generalization performance.
Tasks	Semantic Segmentation
Published	2020-02-11
URL	https://arxiv.org/abs/2002.04207v1
PDF	https://arxiv.org/pdf/2002.04207v1.pdf
PWC	https://paperswithcode.com/paper/edge-gated-cnns-for-volumetric-semantic
Repo
Framework

Deep Convolutional Neural Networks with Spatial Regularization, Volume and Star-shape Priori for Image Segmentation


Title	Deep Convolutional Neural Networks with Spatial Regularization, Volume and Star-shape Priori for Image Segmentation
Authors	Jun Liu, Xiangyue Wang, Xue-cheng Tai
Abstract	We use Deep Convolutional Neural Networks (DCNNs) for image segmentation problems. DCNNs can well extract the features from natural images. However, the classification functions in the existing network architecture of CNNs are simple and lack capabilities to handle important spatial information in a way that have been done for many well-known traditional variational models. Prior such as spatial regularity, volume prior and object shapes cannot be well handled by existing DCNNs. We propose a novel Soft Threshold Dynamics (STD) framework which can easily integrate many spatial priors of the classical variational models into the DCNNs for image segmentation. The novelty of our method is to interpret the softmax activation function as a dual variable in a variational problem, and thus many spatial priors can be imposed in the dual space. From this viewpoint, we can build a STD based framework which can enable the outputs of DCNNs to have many special priors such as spatial regularity, volume constraints and star-shape priori. The proposed method is a general mathematical framework and it can be applied to any semantic segmentation DCNNs. To show the efficiency and accuracy of our method, we applied it to the popular DeepLabV3+ image segmentation network, and the experiments results show that our method can work efficiently on data-driven image segmentation DCNNs.
Tasks	Semantic Segmentation
Published	2020-02-10
URL	https://arxiv.org/abs/2002.03989v1
PDF	https://arxiv.org/pdf/2002.03989v1.pdf
PWC	https://paperswithcode.com/paper/deep-convolutional-neural-networks-with
Repo
Framework

Segmentação de imagens utilizando competição e cooperação entre partículas


Title	Segmentação de imagens utilizando competição e cooperação entre partículas
Authors	Bárbara Ribeiro da Silva, Fabricio Aparecido Breve
Abstract	This paper presents an extension proposal of the semi-supervised learning method known as Particle Competition and Cooperation for carrying out tasks of image segmentation. Preliminary results show that this is a promising approach. Este artigo apresenta uma proposta de extens~ao do modelo de aprendizado semi-supervisionado conhecido como Competi\c{c}~ao e Coopera\c{c}~ao entre Part'iculas para a realiza\c{c}~ao de tarefas de segmenta\c{c}~ao de imagens. Resultados preliminares mostram que esta 'e uma abordagem promissora.
Tasks	Semantic Segmentation
Published	2020-02-08
URL	https://arxiv.org/abs/2002.05521v1
PDF	https://arxiv.org/pdf/2002.05521v1.pdf
PWC	https://paperswithcode.com/paper/segmentacao-de-imagens-utilizando-competicao
Repo
Framework

Near-Optimal Hardware Design for Convolutional Neural Networks


Title	Near-Optimal Hardware Design for Convolutional Neural Networks
Authors	Byungik Ahn
Abstract	Recently, the demand of low-power deep-learning hardware for industrial applications has been increasing. Most existing artificial intelligence (AI) chips have evolved to rely on new chip technologies rather than on radically new hardware architectures, to maintain their generality. This study proposes a novel, special-purpose, and high-efficiency hardware architecture for convolutional neural networks. The proposed architecture maximizes the utilization of multipliers by designing the computational circuit with the same structure as that of the computational flow of the model, rather than mapping computations to fixed hardware. In addition, a specially designed filter circuit simultaneously provides all the data of the receptive field, using only one memory read operation during each clock cycle; this allows the computation circuit to operate seamlessly without idle cycles. Our reference system based on the proposed architecture uses 97% of the peak-multiplication capability in actual computations required by the computation model throughout the computation period. In addition, overhead components are minimized so that the proportion of the resources constituting the non-multiplier components is smaller than that constituting the multiplier components, which are indispensable for the computational model. The efficiency of the proposed architecture is close to an ideally efficient system that cannot be improved further in terms of the performance-to-resource ratio. An implementation based on the proposed hardware architecture has been applied in commercial AI products.
Tasks
Published	2020-02-06
URL	https://arxiv.org/abs/2002.05526v1
PDF	https://arxiv.org/pdf/2002.05526v1.pdf
PWC	https://paperswithcode.com/paper/near-optimal-hardware-design-for
Repo
Framework

Distributed Learning in the Non-Convex World: From Batch to Streaming Data, and Beyond


Title	Distributed Learning in the Non-Convex World: From Batch to Streaming Data, and Beyond
Authors	Tsung-Hui Chang, Mingyi Hong, Hoi-To Wai, Xinwei Zhang, Songtao Lu
Abstract	Distributed learning has become a critical enabler of the massively connected world envisioned by many. This article discusses four key elements of scalable distributed processing and real-time intelligence — problems, data, communication and computation. Our aim is to provide a fresh and unique perspective about how these elements should work together in an effective and coherent manner. In particular, we {provide a selective review} about the recent techniques developed for optimizing non-convex models (i.e., problem classes), processing batch and streaming data (i.e., data types), over the networks in a distributed manner (i.e., communication and computation paradigm). We describe the intuitions and connections behind a core set of popular distributed algorithms, emphasizing how to trade off between computation and communication costs. Practical issues and future research directions will also be discussed.
Tasks
Published	2020-01-14
URL	https://arxiv.org/abs/2001.04786v1
PDF	https://arxiv.org/pdf/2001.04786v1.pdf
PWC	https://paperswithcode.com/paper/distributed-learning-in-the-non-convex-world
Repo
Framework

Augmenting Visual Question Answering with Semantic Frame Information in a Multitask Learning Approach


Title	Augmenting Visual Question Answering with Semantic Frame Information in a Multitask Learning Approach
Authors	Mehrdad Alizadeh, Barbara Di Eugenio
Abstract	Visual Question Answering (VQA) concerns providing answers to Natural Language questions about images. Several deep neural network approaches have been proposed to model the task in an end-to-end fashion. Whereas the task is grounded in visual processing, if the question focuses on events described by verbs, the language understanding component becomes crucial. Our hypothesis is that models should be aware of verb semantics, as expressed via semantic role labels, argument types, and/or frame elements. Unfortunately, no VQA dataset exists that includes verb semantic information. Our first contribution is a new VQA dataset (imSituVQA) that we built by taking advantage of the imSitu annotations. The imSitu dataset consists of images manually labeled with semantic frame elements, mostly taken from FrameNet. Second, we propose a multitask CNN-LSTM VQA model that learns to classify the answers as well as the semantic frame elements. Our experiments show that semantic frame element classification helps the VQA system avoid inconsistent responses and improves performance.
Tasks	Question Answering, Visual Question Answering
Published	2020-01-31
URL	https://arxiv.org/abs/2001.11673v1
PDF	https://arxiv.org/pdf/2001.11673v1.pdf
PWC	https://paperswithcode.com/paper/augmenting-visual-question-answering-with
Repo
Framework

SQuINTing at VQA Models: Interrogating VQA Models with Sub-Questions


Title	SQuINTing at VQA Models: Interrogating VQA Models with Sub-Questions
Authors	Ramprasaath R. Selvaraju, Purva Tendulkar, Devi Parikh, Eric Horvitz, Marco Ribeiro, Besmira Nushi, Ece Kamar
Abstract	Existing VQA datasets contain questions with varying levels of complexity. While the majority of questions in these datasets require perception for recognizing existence, properties, and spatial relationships of entities, a significant portion of questions pose challenges that correspond to reasoning tasks – tasks that can only be answered through a synthesis of perception and knowledge about the world, logic and / or reasoning. This distinction allows us to notice when existing VQA models have consistency issues – they answer the reasoning question correctly but fail on associated low-level perception questions. For example, models answer the complex reasoning question “Is the banana ripe enough to eat?” correctly, but fail on the associated perception question “Are the bananas mostly green or yellow?” indicating that the model likely answered the reasoning question correctly but for the wrong reason. We quantify the extent to which this phenomenon occurs by creating a new Reasoning split of the VQA dataset and collecting Sub-VQA, a new dataset consisting of 200K new perception questions which serve as sub questions corresponding to the set of perceptual tasks needed to effectively answer the complex reasoning questions in the Reasoning split. Additionally, we propose an approach called Sub-Question Importance-aware Network Tuning (SQuINT), which encourages the model to attend do the same parts of the image when answering the reasoning question and the perception sub questions. We show that SQuINT improves model consistency by 7.8%, also marginally improving its performance on the Reasoning questions in VQA, while also displaying qualitatively better attention maps.
Tasks	Visual Question Answering
Published	2020-01-20
URL	https://arxiv.org/abs/2001.06927v1
PDF	https://arxiv.org/pdf/2001.06927v1.pdf
PWC	https://paperswithcode.com/paper/squinting-at-vqa-models-interrogating-vqa
Repo
Framework

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding


Title	MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding
Authors	Geondo Park, Chihye Han, Wonjun Yoon, Daeshik Kim
Abstract	Visual-semantic embedding enables various tasks such as image-text retrieval, image captioning, and visual question answering. The key to successful visual-semantic embedding is to express visual and textual data properly by accounting for their intricate relationship. While previous studies have achieved much advance by encoding the visual and textual data into a joint space where similar concepts are closely located, they often represent data by a single vector ignoring the presence of multiple important components in an image or text. Thus, in addition to the joint embedding space, we propose a novel multi-head self-attention network to capture various components of visual and textual data by attending to important parts in data. Our approach achieves the new state-of-the-art results in image-text retrieval tasks on MS-COCO and Flicker30K datasets. Through the visualization of the attention maps that capture distinct semantic components at multiple positions in the image and the text, we demonstrate that our method achieves an effective and interpretable visual-semantic joint space.
Tasks	Image Captioning, Question Answering, Visual Question Answering
Published	2020-01-11
URL	https://arxiv.org/abs/2001.03712v1
PDF	https://arxiv.org/pdf/2001.03712v1.pdf
PWC	https://paperswithcode.com/paper/mhsan-multi-head-self-attention-network-for
Repo
Framework

Formal Controller Synthesis for Continuous-Space MDPs via Model-Free Reinforcement Learning


Title	Formal Controller Synthesis for Continuous-Space MDPs via Model-Free Reinforcement Learning
Authors	Abolfazl Lavaei, Fabio Somenzi, Sadegh Soudjani, Ashutosh Trivedi, Majid Zamani
Abstract	A novel reinforcement learning scheme to synthesize policies for continuous-space Markov decision processes (MDPs) is proposed. This scheme enables one to apply model-free, off-the-shelf reinforcement learning algorithms for finite MDPs to compute optimal strategies for the corresponding continuous-space MDPs without explicitly constructing the finite-state abstraction. The proposed approach is based on abstracting the system with a finite MDP (without constructing it explicitly) with unknown transition probabilities, synthesizing strategies over the abstract MDP, and then mapping the results back over the concrete continuous-space MDP with approximate optimality guarantees. The properties of interest for the system belong to a fragment of linear temporal logic, known as syntactically co-safe linear temporal logic (scLTL), and the synthesis requirement is to maximize the probability of satisfaction within a given bounded time horizon. A key contribution of the paper is to leverage the classical convergence results for reinforcement learning on finite MDPs and provide control strategies maximizing the probability of satisfaction over unknown, continuous-space MDPs while providing probabilistic closeness guarantees. Automata-based reward functions are often sparse; we present a novel potential-based reward shaping technique to produce dense rewards to speed up learning. The effectiveness of the proposed approach is demonstrated by applying it to three physical benchmarks concerning the regulation of a room’s temperature, control of a road traffic cell, and of a 7-dimensional nonlinear model of a BMW 320i car.
Tasks
Published	2020-03-02
URL	https://arxiv.org/abs/2003.00712v1
PDF	https://arxiv.org/pdf/2003.00712v1.pdf
PWC	https://paperswithcode.com/paper/formal-controller-synthesis-for-continuous
Repo
Framework

Face Hallucination with Finishing Touches


Title	Face Hallucination with Finishing Touches
Authors	Yang Zhang, Ivor W. Tsang, Jun Li, Ping Liu, Xiaobo Lu, Xin Yu
Abstract	Obtaining a high-quality frontal face image from a low-resolution (LR) non-frontal face image is primarily important for many facial analysis applications. However, mainstreams either focus on super-resolving near-frontal LR faces or frontalizing non-frontal high-resolution (HR) faces. It is desirable to perform both tasks seamlessly for daily-life unconstrained face images. In this paper, we present a novel Vivid Face Hallucination Generative Adversarial Network (VividGAN) devised for simultaneously super-resolving and frontalizing tiny non-frontal face images. VividGAN consists of a Vivid Face Hallucination Network (Vivid-FHnet) and two discriminators, i.e., Coarse-D and Fine-D. The Vivid-FHnet first generates a coarse frontal HR face and then makes use of the structure prior, i.e., fine-grained facial components, to achieve a fine frontal HR face image. Specifically, we propose a facial component-aware module, which adopts the facial geometry guidance as clues to accurately align and merge the coarse frontal HR face and prior information. Meanwhile, the two-level discriminators are designed to capture both the global outline of the face as well as detailed facial characteristics. The Coarse-D enforces the coarse hallucinated faces to be upright and complete; while the Fine-D focuses on the fine hallucinated ones for sharper details. Extensive experiments demonstrate that our VividGAN achieves photo-realistic frontal HR faces, reaching superior performance in downstream tasks, i.e., face recognition and expression classification, compared with other state-of-the-art methods.
Tasks	Face Hallucination, Face Recognition
Published	2020-02-09
URL	https://arxiv.org/abs/2002.03308v1
PDF	https://arxiv.org/pdf/2002.03308v1.pdf
PWC	https://paperswithcode.com/paper/face-hallucination-with-finishing-touches
Repo
Framework

Towards a Human-like Open-Domain Chatbot


Title	Towards a Human-like Open-Domain Chatbot
Authors	Daniel Adiwardana, Minh-Thang Luong, David R. So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, Quoc V. Le
Abstract	We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations. This 2.6B parameter neural network is simply trained to minimize perplexity of the next token. We also propose a human evaluation metric called Sensibleness and Specificity Average (SSA), which captures key elements of a human-like multi-turn conversation. Our experiments show strong correlation between perplexity and SSA. The fact that the best perplexity end-to-end trained Meena scores high on SSA (72% on multi-turn evaluation) suggests that a human-level SSA of 86% is potentially within reach if we can better optimize perplexity. Additionally, the full version of Meena (with a filtering mechanism and tuned decoding) scores 79% SSA, 23% higher in absolute SSA than the existing chatbots we evaluated.
Tasks	Chatbot
Published	2020-01-27
URL	https://arxiv.org/abs/2001.09977v3
PDF	https://arxiv.org/pdf/2001.09977v3.pdf
PWC	https://paperswithcode.com/paper/towards-a-human-like-open-domain-chatbot
Repo
Framework

Instance Separation Emerges from Inpainting


Title	Instance Separation Emerges from Inpainting
Authors	Steffen Wolf, Fred A. Hamprecht, Jan Funke
Abstract	Deep neural networks trained to inpaint partially occluded images show a deep understanding of image composition and have even been shown to remove objects from images convincingly. In this work, we investigate how this implicit knowledge of image composition can be leveraged for fully self-supervised instance separation. We propose a measure for the independence of two image regions given a fully self-supervised inpainting network and separate objects by maximizing this independence. We evaluate our method on two microscopy image datasets and show that it reaches similar segmentation performance to fully supervised methods.
Tasks
Published	2020-02-28
URL	https://arxiv.org/abs/2003.00891v1
PDF	https://arxiv.org/pdf/2003.00891v1.pdf
PWC	https://paperswithcode.com/paper/instance-separation-emerges-from-inpainting
Repo
Framework

CLCNet: Deep learning-based Noise Reduction for Hearing Aids using Complex Linear Coding


Title	CLCNet: Deep learning-based Noise Reduction for Hearing Aids using Complex Linear Coding
Authors	Hendrik Schröter, Tobias Rosenkranz, Alberto N. Escalante B., Marc Aubreville, Andreas Maier
Abstract	Noise reduction is an important part of modern hearing aids and is included in most commercially available devices. Deep learning-based state-of-the-art algorithms, however, either do not consider real-time and frequency resolution constrains or result in poor quality under very noisy conditions. To improve monaural speech enhancement in noisy environments, we propose CLCNet, a framework based on complex valued linear coding. First, we define complex linear coding (CLC) motivated by linear predictive coding (LPC) that is applied in the complex frequency domain. Second, we propose a framework that incorporates complex spectrogram input and coefficient output. Third, we define a parametric normalization for complex valued spectrograms that complies with low-latency and on-line processing. Our CLCNet was evaluated on a mixture of the EUROM database and a real-world noise dataset recorded with hearing aids and compared to traditional real-valued Wiener-Filter gains.
Tasks	Speech Enhancement
Published	2020-01-28
URL	https://arxiv.org/abs/2001.10218v1
PDF	https://arxiv.org/pdf/2001.10218v1.pdf
PWC	https://paperswithcode.com/paper/clcnet-deep-learning-based-noise-reduction
Repo
Framework

Augmenting Visual Place Recognition with Structural Cues


Title	Augmenting Visual Place Recognition with Structural Cues
Authors	Amadeus Oertel, Titus Cieslewski, Davide Scaramuzza
Abstract	In this paper, we propose to augment image-based place recognition with structural cues. Specifically, these structural cues are obtained using structure-from-motion, such that no additional sensors are needed for place recognition. This is achieved by augmenting the 2D convolutional neural network (CNN) typically used for image-based place recognition with a 3D CNN that takes as input a voxel grid derived from the structure-from-motion point cloud. We evaluate different methods for fusing the 2D and 3D features and obtain best performance with global average pooling and simple concatenation. The resulting descriptor exhibits superior recognition performance compared to descriptors extracted from only one of the input modalities, including state-of-the-art image-based descriptors. Especially at low descriptor dimensionalities, we outperform state-of-the-art descriptors by up to 90%.
Tasks	Visual Place Recognition
Published	2020-02-29
URL	https://arxiv.org/abs/2003.00278v1
PDF	https://arxiv.org/pdf/2003.00278v1.pdf
PWC	https://paperswithcode.com/paper/augmenting-visual-place-recognition-with
Repo
Framework

Solving inverse-PDE problems with physics-aware neural networks


Title	Solving inverse-PDE problems with physics-aware neural networks
Authors	Samira Pakravan, Pouria A. Mistani, Miguel Angel Aragon-Calvo, Frederic Gibou
Abstract	We propose a novel composite framework that enables finding unknown fields in the context of inverse problems for partial differential equations (PDEs). We blend the high expressibility of deep neural networks as universal function estimators with the accuracy and reliability of existing numerical algorithms for partial differential equations. Our design brings together techniques of computational mathematics, machine learning and pattern recognition under one umbrella to seamlessly incorporate any domain-specific knowledge and insights through modeling. The network is explicitly aware of the governing physics through a hard-coded PDE solver stage; this subsequently focuses the computational load to only the discovery of the hidden fields. In addition, techniques of pattern recognition and surface reconstruction are used to further represent the unknown fields in a straightforward fashion. Most importantly, our inverse-PDE solver allows effortless integration of domain-specific knowledge about the physics of underlying fields, such as symmetries and proper basis functions. We call this approach Blended Inverse-PDE Networks (hereby dubbed BIPDE-Nets) and demonstrate its applicability on recovering the variable diffusion coefficient in Poisson problems in one and two spatial dimensions. We also show that this approach is robust to noise.
Tasks
Published	2020-01-10
URL	https://arxiv.org/abs/2001.03608v1
PDF	https://arxiv.org/pdf/2001.03608v1.pdf
PWC	https://paperswithcode.com/paper/solving-inverse-pde-problems-with-physics
Repo
Framework