July 29, 2019

3278 words 16 mins read

Paper Group AWR 113

Answering Visual-Relational Queries in Web-Extracted Knowledge Graphs. Intraoperative margin assessment of human breast tissue in optical coherence tomography images using deep neural networks. Synthesizing Normalized Faces from Facial Identity Features. ComboGAN: Unrestrained Scalability for Image Domain Translation. Efficient Sparse Subspace Clus …

Answering Visual-Relational Queries in Web-Extracted Knowledge Graphs


Title	Answering Visual-Relational Queries in Web-Extracted Knowledge Graphs
Authors	Daniel Oñoro-Rubio, Mathias Niepert, Alberto García-Durán, Roberto González, Roberto J. López-Sastre
Abstract	A visual-relational knowledge graph (KG) is a multi-relational graph whose entities are associated with images. We explore novel machine learning approaches for answering visual-relational queries in web-extracted knowledge graphs. To this end, we have created ImageGraph, a KG with 1,330 relation types, 14,870 entities, and 829,931 images crawled from the web. With visual-relational KGs such as ImageGraph one can introduce novel probabilistic query types in which images are treated as first-class citizens. Both the prediction of relations between unseen images as well as multi-relational image retrieval can be expressed with specific families of visual-relational queries. We introduce novel combinations of convolutional networks and knowledge graph embedding methods to answer such queries. We also explore a zero-shot learning scenario where an image of an entirely new entity is linked with multiple relations to entities of an existing KG. The resulting multi-relational grounding of unseen entity images into a knowledge graph serves as a semantic entity representation. We conduct experiments to demonstrate that the proposed methods can answer these visual-relational queries efficiently and accurately.
Tasks	Graph Embedding, Image Retrieval, Knowledge Graph Embedding, Knowledge Graph Embeddings, Knowledge Graphs, Representation Learning, Zero-Shot Learning
Published	2017-09-07
URL	https://arxiv.org/abs/1709.02314v6
PDF	https://arxiv.org/pdf/1709.02314v6.pdf
PWC	https://paperswithcode.com/paper/representation-learning-for-visual-relational
Repo	https://github.com/nle-ml/mmkb
Framework	none

Intraoperative margin assessment of human breast tissue in optical coherence tomography images using deep neural networks


Title	Intraoperative margin assessment of human breast tissue in optical coherence tomography images using deep neural networks
Authors	Amal Rannen Triki, Matthew B. Blaschko, Yoon Mo Jung, Seungri Song, Hyun Ju Han, Seung Il Kim, Chulmin Joo
Abstract	Objective: In this work, we perform margin assessment of human breast tissue from optical coherence tomography (OCT) images using deep neural networks (DNNs). This work simulates an intraoperative setting for breast cancer lumpectomy. Methods: To train the DNNs, we use both the state-of-the-art methods (Weight Decay and DropOut) and a newly introduced regularization method based on function norms. Commonly used methods can fail when only a small database is available. The use of a function norm introduces a direct control over the complexity of the function with the aim of diminishing the risk of overfitting. Results: As neither the code nor the data of previous results are publicly available, the obtained results are compared with reported results in the literature for a conservative comparison. Moreover, our method is applied to locally collected data on several data configurations. The reported results are the average over the different trials. Conclusion: The experimental results show that the use of DNNs yields significantly better results than other techniques when evaluated in terms of sensitivity, specificity, F1 score, G-mean and Matthews correlation coefficient. Function norm regularization yielded higher and more robust results than competing methods. Significance: We have demonstrated a system that shows high promise for (partially) automated margin assessment of human breast tissue, Equal error rate (EER) is reduced from approximately 12% (the lowest reported in the literature) to 5%,–,a 58% reduction. The method is computationally feasible for intraoperative application (less than 2 seconds per image).
Tasks
Published	2017-03-31
URL	http://arxiv.org/abs/1703.10827v1
PDF	http://arxiv.org/pdf/1703.10827v1.pdf
PWC	https://paperswithcode.com/paper/intraoperative-margin-assessment-of-human
Repo	https://github.com/AmalRT/DNN_Reg
Framework	none

Synthesizing Normalized Faces from Facial Identity Features


Title	Synthesizing Normalized Faces from Facial Identity Features
Authors	Forrester Cole, David Belanger, Dilip Krishnan, Aaron Sarna, Inbar Mosseri, William T. Freeman
Abstract	We present a method for synthesizing a frontal, neutral-expression image of a person’s face given an input face photograph. This is achieved by learning to generate facial landmarks and textures from features extracted from a facial-recognition network. Unlike previous approaches, our encoding feature vector is largely invariant to lighting, pose, and facial expression. Exploiting this invariance, we train our decoder network using only frontal, neutral-expression photographs. Since these photographs are well aligned, we can decompose them into a sparse set of landmark points and aligned texture maps. The decoder then predicts landmarks and textures independently and combines them using a differentiable image warping operation. The resulting images can be used for a number of applications, such as analyzing facial attributes, exposure and white balance adjustment, or creating a 3-D avatar.
Tasks
Published	2017-01-17
URL	http://arxiv.org/abs/1701.04851v4
PDF	http://arxiv.org/pdf/1701.04851v4.pdf
PWC	https://paperswithcode.com/paper/synthesizing-normalized-faces-from-facial
Repo	https://github.com/nabeel3133/3D-texture-fitting
Framework	none

ComboGAN: Unrestrained Scalability for Image Domain Translation


Title	ComboGAN: Unrestrained Scalability for Image Domain Translation
Authors	Asha Anoosheh, Eirikur Agustsson, Radu Timofte, Luc Van Gool
Abstract	This year alone has seen unprecedented leaps in the area of learning-based image translation, namely CycleGAN, by Zhu et al. But experiments so far have been tailored to merely two domains at a time, and scaling them to more would require an quadratic number of models to be trained. And with two-domain models taking days to train on current hardware, the number of domains quickly becomes limited by the time and resources required to process them. In this paper, we propose a multi-component image translation model and training scheme which scales linearly - both in resource consumption and time required - with the number of domains. We demonstrate its capabilities on a dataset of paintings by 14 different artists and on images of the four different seasons in the Alps. Note that 14 data groups would need (14 choose 2) = 91 different CycleGAN models: a total of 182 generator/discriminator pairs; whereas our model requires only 14 generator/discriminator pairs.
Tasks	Image-to-Image Translation
Published	2017-12-19
URL	http://arxiv.org/abs/1712.06909v1
PDF	http://arxiv.org/pdf/1712.06909v1.pdf
PWC	https://paperswithcode.com/paper/combogan-unrestrained-scalability-for-image
Repo	https://github.com/AAnoosheh/ComboGAN
Framework	pytorch

Efficient Sparse Subspace Clustering by Nearest Neighbour Filtering


Title	Efficient Sparse Subspace Clustering by Nearest Neighbour Filtering
Authors	Stephen Tierney, Yi Guo, Junbin Gao
Abstract	Sparse Subspace Clustering (SSC) has been used extensively for subspace identification tasks due to its theoretical guarantees and relative ease of implementation. However SSC has quadratic computation and memory requirements with respect to the number of input data points. This burden has prohibited SSCs use for all but the smallest datasets. To overcome this we propose a new method, k-SSC, that screens out a large number of data points to both reduce SSC to linear memory and computational requirements. We provide theoretical analysis for the bounds of success for k-SSC. Our experiments show that k-SSC exceeds theoretical expectations and outperforms existing SSC approximations by maintaining the classification performance of SSC. Furthermore in the spirit of reproducible research we have publicly released the source code for k-SSC
Tasks
Published	2017-04-13
URL	http://arxiv.org/abs/1704.03958v1
PDF	http://arxiv.org/pdf/1704.03958v1.pdf
PWC	https://paperswithcode.com/paper/efficient-sparse-subspace-clustering-by
Repo	https://github.com/sjtrny/kssc
Framework	none

A Sub-Character Architecture for Korean Language Processing


Title	A Sub-Character Architecture for Korean Language Processing
Authors	Karl Stratos
Abstract	We introduce a novel sub-character architecture that exploits a unique compositional structure of the Korean language. Our method decomposes each character into a small set of primitive phonetic units called jamo letters from which character- and word-level representations are induced. The jamo letters divulge syntactic and semantic information that is difficult to access with conventional character-level units. They greatly alleviate the data sparsity problem, reducing the observation space to 1.6% of the original while increasing accuracy in our experiments. We apply our architecture to dependency parsing and achieve dramatic improvement over strong lexical baselines.
Tasks	Dependency Parsing
Published	2017-07-20
URL	http://arxiv.org/abs/1707.06341v2
PDF	http://arxiv.org/pdf/1707.06341v2.pdf
PWC	https://paperswithcode.com/paper/a-sub-character-architecture-for-korean
Repo	https://github.com/karlstratos/koreannet
Framework	none

An End-to-End Compression Framework Based on Convolutional Neural Networks


Title	An End-to-End Compression Framework Based on Convolutional Neural Networks
Authors	Feng Jiang, Wen Tao, Shaohui Liu, Jie Ren, Xun Guo, Debin Zhao
Abstract	Deep learning, e.g., convolutional neural networks (CNNs), has achieved great success in image processing and computer vision especially in high level vision applications such as recognition and understanding. However, it is rarely used to solve low-level vision problems such as image compression studied in this paper. Here, we move forward a step and propose a novel compression framework based on CNNs. To achieve high-quality image compression at low bit rates, two CNNs are seamlessly integrated into an end-to-end compression framework. The first CNN, named compact convolutional neural network (ComCNN), learns an optimal compact representation from an input image, which preserves the structural information and is then encoded using an image codec (e.g., JPEG, JPEG2000 or BPG). The second CNN, named reconstruction convolutional neural network (RecCNN), is used to reconstruct the decoded image with high-quality in the decoding end. To make two CNNs effectively collaborate, we develop a unified end-to-end learning algorithm to simultaneously learn ComCNN and RecCNN, which facilitates the accurate reconstruction of the decoded image using RecCNN. Such a design also makes the proposed compression framework compatible with existing image coding standards. Experimental results validate that the proposed compression framework greatly outperforms several compression frameworks that use existing image coding standards with state-of-the-art deblocking or denoising post-processing methods.
Tasks	Denoising, Image Compression
Published	2017-08-02
URL	http://arxiv.org/abs/1708.00838v1
PDF	http://arxiv.org/pdf/1708.00838v1.pdf
PWC	https://paperswithcode.com/paper/an-end-to-end-compression-framework-based-on
Repo	https://github.com/kunalrdeshmukh/End-to-end-compression
Framework	pytorch

Learn&Fuzz: Machine Learning for Input Fuzzing


Title	Learn&Fuzz: Machine Learning for Input Fuzzing
Authors	Patrice Godefroid, Hila Peleg, Rishabh Singh
Abstract	Fuzzing consists of repeatedly testing an application with modified, or fuzzed, inputs with the goal of finding security vulnerabilities in input-parsing code. In this paper, we show how to automate the generation of an input grammar suitable for input fuzzing using sample inputs and neural-network-based statistical machine-learning techniques. We present a detailed case study with a complex input format, namely PDF, and a large complex security-critical parser for this format, namely, the PDF parser embedded in Microsoft’s new Edge browser. We discuss (and measure) the tension between conflicting learning and fuzzing goals: learning wants to capture the structure of well-formed inputs, while fuzzing wants to break that structure in order to cover unexpected code paths and find bugs. We also present a new algorithm for this learn&fuzz challenge which uses a learnt input probability distribution to intelligently guide where to fuzz inputs.
Tasks
Published	2017-01-25
URL	http://arxiv.org/abs/1701.07232v1
PDF	http://arxiv.org/pdf/1701.07232v1.pdf
PWC	https://paperswithcode.com/paper/learnfuzz-machine-learning-for-input-fuzzing
Repo	https://github.com/m-zakeri/iust_deep_fuzz
Framework	tf

Toward Goal-Driven Neural Network Models for the Rodent Whisker-Trigeminal System


Title	Toward Goal-Driven Neural Network Models for the Rodent Whisker-Trigeminal System
Authors	Chengxu Zhuang, Jonas Kubilius, Mitra Hartmann, Daniel Yamins
Abstract	In large part, rodents see the world through their whiskers, a powerful tactile sense enabled by a series of brain areas that form the whisker-trigeminal system. Raw sensory data arrives in the form of mechanical input to the exquisitely sensitive, actively-controllable whisker array, and is processed through a sequence of neural circuits, eventually arriving in cortical regions that communicate with decision-making and memory areas. Although a long history of experimental studies has characterized many aspects of these processing stages, the computational operations of the whisker-trigeminal system remain largely unknown. In the present work, we take a goal-driven deep neural network (DNN) approach to modeling these computations. First, we construct a biophysically-realistic model of the rat whisker array. We then generate a large dataset of whisker sweeps across a wide variety of 3D objects in highly-varying poses, angles, and speeds. Next, we train DNNs from several distinct architectural families to solve a shape recognition task in this dataset. Each architectural family represents a structurally-distinct hypothesis for processing in the whisker-trigeminal system, corresponding to different ways in which spatial and temporal information can be integrated. We find that most networks perform poorly on the challenging shape recognition task, but that specific architectures from several families can achieve reasonable performance levels. Finally, we show that Representational Dissimilarity Matrices (RDMs), a tool for comparing population codes between neural systems, can separate these higher-performing networks with data of a type that could plausibly be collected in a neurophysiological or imaging experiment. Our results are a proof-of-concept that goal-driven DNN networks of the whisker-trigeminal system are potentially within reach.
Tasks	Decision Making
Published	2017-06-23
URL	http://arxiv.org/abs/1706.07555v1
PDF	http://arxiv.org/pdf/1706.07555v1.pdf
PWC	https://paperswithcode.com/paper/toward-goal-driven-neural-network-models-for
Repo	https://github.com/neuroailab/whisker_model
Framework	tf

Fast and robust curve skeletonization for real-world elongated objects


Title	Fast and robust curve skeletonization for real-world elongated objects
Authors	Amy Tabb, Henry Medeiros
Abstract	We consider the problem of extracting curve skeletons of three-dimensional, elongated objects given a noisy surface, which has applications in agricultural contexts such as extracting the branching structure of plants. We describe an efficient and robust method based on breadth-first search that can determine curve skeletons in these contexts. Our approach is capable of automatically detecting junction points as well as spurious segments and loops. All of that is accomplished with only one user-adjustable parameter. The run time of our method ranges from hundreds of milliseconds to less than four seconds on large, challenging datasets, which makes it appropriate for situations where real-time decision making is needed. Experiments on synthetic models as well as on data from real world objects, some of which were collected in challenging field conditions, show that our approach compares favorably to classical thinning algorithms as well as to recent contributions to the field.
Tasks	Decision Making
Published	2017-02-24
URL	http://arxiv.org/abs/1702.07619v4
PDF	http://arxiv.org/pdf/1702.07619v4.pdf
PWC	https://paperswithcode.com/paper/fast-and-robust-curve-skeletonization-for
Repo	https://github.com/amy-tabb/CurveSkel-Tabb-Medeiros-2018
Framework	none

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions


Title	AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
Authors	Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik
Abstract	This paper introduces a video dataset of spatio-temporally localized Atomic Visual Actions (AVA). The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1.58M action labels with multiple labels per person occurring frequently. The key characteristics of our dataset are: (1) the definition of atomic visual actions, rather than composite actions; (2) precise spatio-temporal annotations with possibly multiple annotations for each person; (3) exhaustive annotation of these atomic actions over 15-minute video clips; (4) people temporally linked across consecutive segments; and (5) using movies to gather a varied set of action representations. This departs from existing datasets for spatio-temporal action recognition, which typically provide sparse annotations for composite actions in short video clips. We will release the dataset publicly. AVA, with its realistic scene and action complexity, exposes the intrinsic difficulty of action recognition. To benchmark this, we present a novel approach for action localization that builds upon the current state-of-the-art methods, and demonstrates better performance on JHMDB and UCF101-24 categories. While setting a new state of the art on existing datasets, the overall results on AVA are low at 15.6% mAP, underscoring the need for developing new approaches for video understanding.
Tasks	Action Localization, Temporal Action Localization, Video Understanding
Published	2017-05-23
URL	http://arxiv.org/abs/1705.08421v4
PDF	http://arxiv.org/pdf/1705.08421v4.pdf
PWC	https://paperswithcode.com/paper/ava-a-video-dataset-of-spatio-temporally
Repo	https://github.com/tensorflow/models/tree/master/research/object_detection
Framework	tf


Title	Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
Authors	Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton van den Hengel
Abstract	A robot that can carry out a natural-language instruction has been a dream since before the Jetsons cartoon series imagined a life of leisure mediated by a fleet of attentive robot helpers. It is a dream that remains stubbornly distant. However, recent advances in vision and language methods have made incredible progress in closely related areas. This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering. Both tasks can be interpreted as visually grounded sequence-to-sequence translation problems, and many of the same methods are applicable. To enable and encourage the application of vision and language methods to the problem of interpreting visually-grounded navigation instructions, we present the Matterport3D Simulator – a large-scale reinforcement learning environment based on real imagery. Using this simulator, which can in future support a range of embodied vision and language tasks, we provide the first benchmark dataset for visually-grounded natural language navigation in real buildings – the Room-to-Room (R2R) dataset.
Tasks	Visual Question Answering
Published	2017-11-20
URL	http://arxiv.org/abs/1711.07280v3
PDF	http://arxiv.org/pdf/1711.07280v3.pdf
PWC	https://paperswithcode.com/paper/vision-and-language-navigation-interpreting
Repo	https://github.com/batra-mlp-lab/vln-chasing-ghosts
Framework	none

Reading Wikipedia to Answer Open-Domain Questions


Title	Reading Wikipedia to Answer Open-Domain Questions
Authors	Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes
Abstract	This paper proposes to tackle open- domain question answering using Wikipedia as the unique knowledge source: the answer to any factoid question is a text span in a Wikipedia article. This task of machine reading at scale combines the challenges of document retrieval (finding the relevant articles) with that of machine comprehension of text (identifying the answer spans from those articles). Our approach combines a search component based on bigram hashing and TF-IDF matching with a multi-layer recurrent neural network model trained to detect answers in Wikipedia paragraphs. Our experiments on multiple existing QA datasets indicate that (1) both modules are highly competitive with respect to existing counterparts and (2) multitask learning using distant supervision on their combination is an effective complete system on this challenging task.
Tasks	Open-Domain Question Answering, Question Answering, Reading Comprehension
Published	2017-03-31
URL	http://arxiv.org/abs/1704.00051v2
PDF	http://arxiv.org/pdf/1704.00051v2.pdf
PWC	https://paperswithcode.com/paper/reading-wikipedia-to-answer-open-domain
Repo	https://github.com/facebookresearch/DrQA
Framework	tf

Frustum PointNets for 3D Object Detection from RGB-D Data


Title	Frustum PointNets for 3D Object Detection from RGB-D Data
Authors	Charles R. Qi, Wei Liu, Chenxia Wu, Hao Su, Leonidas J. Guibas
Abstract	In this work, we study 3D object detection from RGB-D data in both indoor and outdoor scenes. While previous methods focus on images or 3D voxels, often obscuring natural 3D patterns and invariances of 3D data, we directly operate on raw point clouds by popping up RGB-D scans. However, a key challenge of this approach is how to efficiently localize objects in point clouds of large-scale scenes (region proposal). Instead of solely relying on 3D proposals, our method leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall for even small objects. Benefited from learning directly in raw point clouds, our method is also able to precisely estimate 3D bounding boxes even under strong occlusion or with very sparse points. Evaluated on KITTI and SUN RGB-D 3D detection benchmarks, our method outperforms the state of the art by remarkable margins while having real-time capability.
Tasks	3D Object Detection, Object Detection, Object Localization
Published	2017-11-22
URL	http://arxiv.org/abs/1711.08488v2
PDF	http://arxiv.org/pdf/1711.08488v2.pdf
PWC	https://paperswithcode.com/paper/frustum-pointnets-for-3d-object-detection
Repo	https://github.com/voidrank/Geo-CNN
Framework	tf

Adversarial examples for generative models


Title	Adversarial examples for generative models
Authors	Jernej Kos, Ian Fischer, Dawn Song
Abstract	We explore methods of producing adversarial examples on deep generative models such as the variational autoencoder (VAE) and the VAE-GAN. Deep learning architectures are known to be vulnerable to adversarial examples, but previous work has focused on the application of adversarial examples to classification tasks. Deep generative models have recently become popular due to their ability to model input data distributions and generate realistic examples from those distributions. We present three classes of attacks on the VAE and VAE-GAN architectures and demonstrate them against networks trained on MNIST, SVHN and CelebA. Our first attack leverages classification-based adversaries by attaching a classifier to the trained encoder of the target generative model, which can then be used to indirectly manipulate the latent representation. Our second attack directly uses the VAE loss function to generate a target reconstruction image from the adversarial example. Our third attack moves beyond relying on classification or the standard loss for the gradient and directly optimizes against differences in source and target latent representations. We also motivate why an attacker might be interested in deploying such techniques against a target generative network.
Tasks
Published	2017-02-22
URL	http://arxiv.org/abs/1702.06832v1
PDF	http://arxiv.org/pdf/1702.06832v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-examples-for-generative-models
Repo	https://github.com/rohban-lab/Salehi_submitted_2020
Framework	pytorch