January 27, 2020

3165 words 15 mins read

Paper Group ANR 1152

Paper Group ANR 1152

The (Non-)Utility of Structural Features in BiLSTM-based Dependency Parsers. An Image Segmentation Model Based on a Variational Formulation. Neighborhood Growth Determines Geometric Priors for Relational Representation Learning. Modeling Noisiness to Recognize Named Entities using Multitask Neural Networks on Social Media. MMGAN: Generative Adversa …

The (Non-)Utility of Structural Features in BiLSTM-based Dependency Parsers

Title The (Non-)Utility of Structural Features in BiLSTM-based Dependency Parsers
Authors Agnieszka Falenska, Jonas Kuhn
Abstract Classical non-neural dependency parsers put considerable effort on the design of feature functions. Especially, they benefit from information coming from structural features, such as features drawn from neighboring tokens in the dependency tree. In contrast, their BiLSTM-based successors achieve state-of-the-art performance without explicit information about the structural context. In this paper we aim to answer the question: How much structural context are the BiLSTM representations able to capture implicitly? We show that features drawn from partial subtrees become redundant when the BiLSTMs are used. We provide a deep insight into information flow in transition- and graph-based neural architectures to demonstrate where the implicit information comes from when the parsers make their decisions. Finally, with model ablations we demonstrate that the structural context is not only present in the models, but it significantly influences their performance.
Tasks
Published 2019-05-29
URL https://arxiv.org/abs/1905.12676v2
PDF https://arxiv.org/pdf/1905.12676v2.pdf
PWC https://paperswithcode.com/paper/the-non-utility-of-structural-features-in
Repo
Framework

An Image Segmentation Model Based on a Variational Formulation

Title An Image Segmentation Model Based on a Variational Formulation
Authors Carlos M. Paniagua Mejia
Abstract Starting from a variational formulation, we present a model for image segmentation that employs both region statistics and edge information. This combination allows for improved flexibility, making the proposed model suitable to process a wider class of images than purely region-based and edge-based models. We perform several simulations with real images that attest to the versatility of the model. We also show another set of experiments on images with certain pathologies that suggest opportunities for improvement.
Tasks Semantic Segmentation
Published 2019-10-13
URL https://arxiv.org/abs/1910.05678v1
PDF https://arxiv.org/pdf/1910.05678v1.pdf
PWC https://paperswithcode.com/paper/an-image-segmentation-model-based-on-a
Repo
Framework

Neighborhood Growth Determines Geometric Priors for Relational Representation Learning

Title Neighborhood Growth Determines Geometric Priors for Relational Representation Learning
Authors Melanie Weber
Abstract The problem of identifying geometric structure in heterogeneous, high-dimensional data is a cornerstone of representation learning. While there exists a large body of literature on the embeddability of canonical graphs, such as lattices or trees, the heterogeneity of the relational data typically encountered in practice limits the applicability of these classical methods. In this paper, we propose a combinatorial approach to evaluating embeddability, i.e., to decide whether a data set is best represented in Euclidean, Hyperbolic or Spherical space. Our method analyzes nearest-neighbor structures and local neighborhood growth rates to identify the geometric priors of suitable embedding spaces. For canonical graphs, the algorithm’s prediction provably matches classical results. As for large, heterogeneous graphs, we introduce an efficiently computable statistic that approximates the algorithm’s decision rule. We validate our method over a range of benchmark data sets and compare with recently published optimization-based embeddability methods.
Tasks Representation Learning
Published 2019-10-12
URL https://arxiv.org/abs/1910.05565v1
PDF https://arxiv.org/pdf/1910.05565v1.pdf
PWC https://paperswithcode.com/paper/neighborhood-growth-determines-geometric
Repo
Framework

Modeling Noisiness to Recognize Named Entities using Multitask Neural Networks on Social Media

Title Modeling Noisiness to Recognize Named Entities using Multitask Neural Networks on Social Media
Authors Gustavo Aguilar, A. Pastor López-Monroy, Fabio A. González, Thamar Solorio
Abstract Recognizing named entities in a document is a key task in many NLP applications. Although current state-of-the-art approaches to this task reach a high performance on clean text (e.g. newswire genres), those algorithms dramatically degrade when they are moved to noisy environments such as social media domains. We present two systems that address the challenges of processing social media data using character-level phonetics and phonology, word embeddings, and Part-of-Speech tags as features. The first model is a multitask end-to-end Bidirectional Long Short-Term Memory (BLSTM)-Conditional Random Field (CRF) network whose output layer contains two CRF classifiers. The second model uses a multitask BLSTM network as feature extractor that transfers the learning to a CRF classifier for the final prediction. Our systems outperform the current F1 scores of the state of the art on the Workshop on Noisy User-generated Text 2017 dataset by 2.45% and 3.69%, establishing a more suitable approach for social media environments.
Tasks Word Embeddings
Published 2019-06-10
URL https://arxiv.org/abs/1906.04129v1
PDF https://arxiv.org/pdf/1906.04129v1.pdf
PWC https://paperswithcode.com/paper/modeling-noisiness-to-recognize-named-1
Repo
Framework

MMGAN: Generative Adversarial Networks for Multi-Modal Distributions

Title MMGAN: Generative Adversarial Networks for Multi-Modal Distributions
Authors Teodora Pandeva, Matthias Schubert
Abstract Over the past years, Generative Adversarial Networks (GANs) have shown a remarkable generation performance especially in image synthesis. Unfortunately, they are also known for having an unstable training process and might loose parts of the data distribution for heterogeneous input data. In this paper, we propose a novel GAN extension for multi-modal distribution learning (MMGAN). In our approach, we model the latent space as a Gaussian mixture model with a number of clusters referring to the number of disconnected data manifolds in the observation space, and include a clustering network, which relates each data manifold to one Gaussian cluster. Thus, the training gets more stable. Moreover, MMGAN allows for clustering real data according to the learned data manifold in the latent space. By a series of benchmark experiments, we illustrate that MMGAN outperforms competitive state-of-the-art models in terms of clustering performance.
Tasks Image Generation
Published 2019-11-15
URL https://arxiv.org/abs/1911.06663v1
PDF https://arxiv.org/pdf/1911.06663v1.pdf
PWC https://paperswithcode.com/paper/mmgan-generative-adversarial-networks-for
Repo
Framework

DistanceNet: Estimating Traveled Distance from Monocular Images using a Recurrent Convolutional Neural Network

Title DistanceNet: Estimating Traveled Distance from Monocular Images using a Recurrent Convolutional Neural Network
Authors Robin Kreuzig, Matthias Ochs, Rudolf Mester
Abstract Classical monocular vSLAM/VO methods suffer from the scale ambiguity problem. Hybrid approaches solve this problem by adding deep learning methods, for example by using depth maps which are predicted by a CNN. We suggest that it is better to base scale estimation on estimating the traveled distance for a set of subsequent images. In this paper, we propose a novel end-to-end many-to-one traveled distance estimator. By using a deep recurrent convolutional neural network (RCNN), the traveled distance between the first and last image of a set of consecutive frames is estimated by our DistanceNet. Geometric features are learned in the CNN part of our model, which are subsequently used by the RNN to learn dynamics and temporal information. Moreover, we exploit the natural order of distances by using ordinal regression to predict the distance. The evaluation on the KITTI dataset shows that our approach outperforms current state-of-the-art deep learning pose estimators and classical mono vSLAM/VO methods in terms of distance prediction. Thus, our DistanceNet can be used as a component to solve the scale problem and help improve current and future classical mono vSLAM/VO methods.
Tasks
Published 2019-04-17
URL http://arxiv.org/abs/1904.08105v1
PDF http://arxiv.org/pdf/1904.08105v1.pdf
PWC https://paperswithcode.com/paper/190408105
Repo
Framework

Two Body Problem: Collaborative Visual Task Completion

Title Two Body Problem: Collaborative Visual Task Completion
Authors Unnat Jain, Luca Weihs, Eric Kolve, Mohammad Rastegari, Svetlana Lazebnik, Ali Farhadi, Alexander Schwing, Aniruddha Kembhavi
Abstract Collaboration is a necessary skill to perform tasks that are beyond one agent’s capabilities. Addressed extensively in both conventional and modern AI, multi-agent collaboration has often been studied in the context of simple grid worlds. We argue that there are inherently visual aspects to collaboration which should be studied in visually rich environments. A key element in collaboration is communication that can be either explicit, through messages, or implicit, through perception of the other agents and the visual world. Learning to collaborate in a visual environment entails learning (1) to perform the task, (2) when and what to communicate, and (3) how to act based on these communications and the perception of the visual world. In this paper we study the problem of learning to collaborate directly from pixels in AI2-THOR and demonstrate the benefits of explicit and implicit modes of communication to perform visual tasks. Refer to our project page for more details: https://prior.allenai.org/projects/two-body-problem
Tasks
Published 2019-04-11
URL http://arxiv.org/abs/1904.05879v1
PDF http://arxiv.org/pdf/1904.05879v1.pdf
PWC https://paperswithcode.com/paper/two-body-problem-collaborative-visual-task
Repo
Framework

Combining Physical Simulators and Object-Based Networks for Control

Title Combining Physical Simulators and Object-Based Networks for Control
Authors Anurag Ajay, Maria Bauza, Jiajun Wu, Nima Fazeli, Joshua B. Tenenbaum, Alberto Rodriguez, Leslie P. Kaelbling
Abstract Physics engines play an important role in robot planning and control; however, many real-world control problems involve complex contact dynamics that cannot be characterized analytically. Most physics engines therefore employ . approximations that lead to a loss in precision. In this paper, we propose a hybrid dynamics model, simulator-augmented interaction networks (SAIN), combining a physics engine with an object-based neural network for dynamics modeling. Compared with existing models that are purely analytical or purely data-driven, our hybrid model captures the dynamics of interacting objects in a more accurate and data-efficient manner.Experiments both in simulation and on a real robot suggest that it also leads to better performance when used in complex control tasks. Finally, we show that our model generalizes to novel environments with varying object shapes and materials.
Tasks
Published 2019-04-13
URL http://arxiv.org/abs/1904.06580v1
PDF http://arxiv.org/pdf/1904.06580v1.pdf
PWC https://paperswithcode.com/paper/combining-physical-simulators-and-object
Repo
Framework

Generating large labeled data sets for laparoscopic image processing tasks using unpaired image-to-image translation

Title Generating large labeled data sets for laparoscopic image processing tasks using unpaired image-to-image translation
Authors Micha Pfeiffer, Isabel Funke, Maria R. Robu, Sebastian Bodenstedt, Leon Strenger, Sandy Engelhardt, Tobias Roß, Matthew J. Clarkson, Kurinchi Gurusamy, Brian R. Davidson, Lena Maier-Hein, Carina Riediger, Thilo Welsch, Jürgen Weitz, Stefanie Speidel
Abstract In the medical domain, the lack of large training data sets and benchmarks is often a limiting factor for training deep neural networks. In contrast to expensive manual labeling, computer simulations can generate large and fully labeled data sets with a minimum of manual effort. However, models that are trained on simulated data usually do not translate well to real scenarios. To bridge the domain gap between simulated and real laparoscopic images, we exploit recent advances in unpaired image-to-image translation. We extent an image-to-image translation method to generate a diverse multitude of realistically looking synthetic images based on images from a simple laparoscopy simulation. By incorporating means to ensure that the image content is preserved during the translation process, we ensure that the labels given for the simulated images remain valid for their realistically looking translations. This way, we are able to generate a large, fully labeled synthetic data set of laparoscopic images with realistic appearance. We show that this data set can be used to train models for the task of liver segmentation of laparoscopic images. We achieve average dice scores of up to 0.89 in some patients without manually labeling a single laparoscopic image and show that using our synthetic data to pre-train models can greatly improve their performance. The synthetic data set will be made publicly available, fully labeled with segmentation maps, depth maps, normal maps, and positions of tools and camera (http://opencas.dkfz.de/image2image).
Tasks Image-to-Image Translation, Liver Segmentation
Published 2019-07-05
URL https://arxiv.org/abs/1907.02882v1
PDF https://arxiv.org/pdf/1907.02882v1.pdf
PWC https://paperswithcode.com/paper/generating-large-labeled-data-sets-for
Repo
Framework

Parameter-Conditioned Sequential Generative Modeling of Fluid Flows

Title Parameter-Conditioned Sequential Generative Modeling of Fluid Flows
Authors Jeremy Morton, Freddie D. Witherden, Mykel J. Kochenderfer
Abstract The computational cost associated with simulating fluid flows can make it infeasible to run many simulations across multiple flow conditions. Building upon concepts from generative modeling, we introduce a new method for learning neural network models capable of performing efficient parameterized simulations of fluid flows. Evaluated on their ability to simulate both two-dimensional and three-dimensional fluid flows, trained models are shown to capture local and global properties of the flow fields at a wide array of flow conditions. Furthermore, flow simulations generated by the trained models are shown to be orders of magnitude faster than the corresponding computational fluid dynamics simulations.
Tasks
Published 2019-12-14
URL https://arxiv.org/abs/1912.06752v1
PDF https://arxiv.org/pdf/1912.06752v1.pdf
PWC https://paperswithcode.com/paper/parameter-conditioned-sequential-generative
Repo
Framework

Transformation Consistent Self-ensembling Model for Semi-supervised Medical Image Segmentation

Title Transformation Consistent Self-ensembling Model for Semi-supervised Medical Image Segmentation
Authors Xiaomeng Li, Lequan Yu, Hao Chen, Chi-Wing Fu, Pheng-Ann Heng
Abstract Deep convolutional neural networks have achieved remarkable progress on a variety of medical image computing tasks. A common problem when applying supervised deep learning methods to medical images is the lack of labeled data, which is very expensive and time-consuming to be collected. In this paper, we present a novel semi-supervised method for medical image segmentation, where the network is optimized by the weighted combination of a common supervised loss for labeled inputs only and a regularization loss for both labeled and unlabeled data. To utilize the unlabeled data, our method encourages the consistent predictions of the network-in-training for the same input under different regularizations. Aiming for the semi-supervised segmentation problem, we enhance the effect of regularization for pixel-level predictions by introducing a transformation, including rotation and flipping, consistent scheme in our self-ensembling model. With the aim of semi-supervised segmentation tasks, we introduce a transformation consistent strategy in our self-ensembling model to enhance the regularization effect for pixel-level predictions. We have extensively validated the proposed semi-supervised method on three typical yet challenging medical image segmentation tasks: (i) skin lesion segmentation from dermoscopy images on International Skin Imaging Collaboration (ISIC) 2017 dataset, (ii) optic disc segmentation from fundus images on Retinal Fundus Glaucoma Challenge (REFUGE) dataset, and (iii) liver segmentation from volumetric CT scans on Liver Tumor Segmentation Challenge (LiTS) dataset. Compared to the state-of-the-arts, our proposed method shows superior segmentation performance on challenging 2D/3D medical images, demonstrating the effectiveness of our semi-supervised method for medical image segmentation.
Tasks Lesion Segmentation, Liver Segmentation, Medical Image Segmentation, Semantic Segmentation
Published 2019-02-28
URL http://arxiv.org/abs/1903.00348v2
PDF http://arxiv.org/pdf/1903.00348v2.pdf
PWC https://paperswithcode.com/paper/transformation-consistent-self-ensembling
Repo
Framework

Few-Features Attack to Fool Machine Learning Models through Mask-Based GAN

Title Few-Features Attack to Fool Machine Learning Models through Mask-Based GAN
Authors Feng Chen, Yunkai Shang, Bo Xu, Jincheng Hu
Abstract GAN is a deep-learning based generative approach to generate contents such as images, languages and speeches. Recently, studies have shown that GAN can also be applied to generative adversarial attack examples to fool the machine-learning models. In comparison with the previous non-learning adversarial example attack approaches, the GAN-based adversarial attack example approach can generate the adversarial samples quickly using the GAN architecture every time facing a new sample after training, but meanwhile needs to perturb the attack samples in great quantities, which results in the unpractical application in reality. To address this issue, we propose a new approach, named Few-Feature-Attack-GAN (FFA-GAN). FFA-GAN has a significant time-consuming advantage than the non-learning adversarial samples approaches and a better non-zero-features performance than the GANbased adversarial sample approaches. FFA-GAN can automatically generate the attack samples in the black-box attack through the GAN architecture instead of the evolutional algorithms or the other non-learning approaches. Besides, we introduce the mask mechanism into the generator network of the GAN architecture to optimize the constraint issue, which can also be regarded as the sparsity problem of the important features. During the training, the different weights of losses of the generator are set in the different training phases to ensure the divergence of the two above mentioned parallel networks of the generator. Experiments are made respectively on the structured data sets KDD-Cup 1999 and CIC-IDS 2017, in which the dimensions of the data are relatively low, and also on the unstructured data sets MNIST and CIFAR-10 with the data of the relatively high dimensions. The results of the experiments demonstrate the effectiveness and the robustness of our proposed approach.
Tasks Adversarial Attack
Published 2019-11-12
URL https://arxiv.org/abs/1911.06269v1
PDF https://arxiv.org/pdf/1911.06269v1.pdf
PWC https://paperswithcode.com/paper/few-features-attack-to-fool-machine-learning
Repo
Framework

Optimization Models for Machine Learning: A Survey

Title Optimization Models for Machine Learning: A Survey
Authors Claudio Gambella, Bissan Ghaddar, Joe Naoum-Sawaya
Abstract This paper surveys the machine learning literature and presents in an optimization framework several commonly used machine learning approaches. Particularly, mathematical optimization models are presented for regression, classification, clustering, deep learning, and adversarial learning, as well as new emerging applications in machine teaching, empirical model learning, and bayesian network structure learning. Such models can benefit from the advancement of numerical optimization techniques which have already played a distinctive role in several machine learning settings. The strengths and the shortcomings of these models are discussed and potential research directions and open problems are highlighted.
Tasks
Published 2019-01-16
URL https://arxiv.org/abs/1901.05331v3
PDF https://arxiv.org/pdf/1901.05331v3.pdf
PWC https://paperswithcode.com/paper/optimization-models-for-machine-learning-a
Repo
Framework

Embedding Structured Contour and Location Prior in Siamesed Fully Convolutional Networks for Road Detection

Title Embedding Structured Contour and Location Prior in Siamesed Fully Convolutional Networks for Road Detection
Authors Qi Wang, Junyu Gao, Yuan Yuan
Abstract Road detection from the perspective of moving vehicles is a challenging issue in autonomous driving. Recently, many deep learning methods spring up for this task because they can extract high-level local features to find road regions from raw RGB data, such as Convolutional Neural Networks (CNN) and Fully Convolutional Networks (FCN). However, how to detect the boundary of road accurately is still an intractable problem. In this paper, we propose a siamesed fully convolutional networks (named as ``s-FCN-loc’'), which is able to consider RGB-channel images, semantic contours and location priors simultaneously to segment road region elaborately. To be specific, the s-FCN-loc has two streams to process the original RGB images and contour maps respectively. At the same time, the location prior is directly appended to the siamesed FCN to promote the final detection performance. Our contributions are threefold: (1) An s-FCN-loc is proposed that learns more discriminative features of road boundaries than the original FCN to detect more accurate road regions; (2) Location prior is viewed as a type of feature map and directly appended to the final feature map in s-FCN-loc to promote the detection performance effectively, which is easier than other traditional methods, namely different priors for different inputs (image patches); (3) The convergent speed of training s-FCN-loc model is 30% faster than the original FCN, because of the guidance of highly structured contours. The proposed approach is evaluated on KITTI Road Detection Benchmark and One-Class Road Detection Dataset, and achieves a competitive result with state of the arts. |
Tasks Autonomous Driving
Published 2019-05-05
URL https://arxiv.org/abs/1905.01575v1
PDF https://arxiv.org/pdf/1905.01575v1.pdf
PWC https://paperswithcode.com/paper/embedding-structured-contour-and-location
Repo
Framework

A Hybrid Framework for Action Recognition in Low-Quality Video Sequences

Title A Hybrid Framework for Action Recognition in Low-Quality Video Sequences
Authors Tej Singh, Dinesh Kumar Vishwakarma
Abstract Vision-based activity recognition is essential for security, monitoring and surveillance applications. Further, real-time analysis having low-quality video and contain less information about surrounding due to poor illumination, and occlusions. Therefore, it needs a more robust and integrated model for low quality and night security operations. In this context, we proposed a hybrid model for illumination invariant human activity recognition based on sub-image histogram equalization enhancement and k-key pose human silhouettes. This feature vector gives good average recognition accuracy on three low exposure video sequences subset of original actions video datasets. Finally, the performance of the proposed approach is tested over three manually downgraded low qualities Weizmann action, KTH, and Ballet Movement dataset. This model outperformed on low exposure videos over existing technique and achieved comparable classification accuracy to similar state-of-the-art methods.
Tasks Activity Recognition, Human Activity Recognition, Temporal Action Localization
Published 2019-03-11
URL http://arxiv.org/abs/1903.04090v1
PDF http://arxiv.org/pdf/1903.04090v1.pdf
PWC https://paperswithcode.com/paper/a-hybrid-framework-for-action-recognition-in
Repo
Framework
comments powered by Disqus