Paper Group ANR 1046
Towards Open Intent Discovery for Conversational Text. Deep Parametric Shape Predictions using Distance Fields. Cooperative Cross-Stream Network for Discriminative Action Representation. Improved Inference via Deep Input Transfer. Scalable Graph Algorithms. A Mutual Bootstrapping Model for Automated Skin Lesion Segmentation and Classification. Deep …
Towards Open Intent Discovery for Conversational Text
Title | Towards Open Intent Discovery for Conversational Text |
Authors | Nikhita Vedula, Nedim Lipka, Pranav Maneriker, Srinivasan Parthasarathy |
Abstract | Detecting and identifying user intent from text, both written and spoken, plays an important role in modelling and understand dialogs. Existing research for intent discovery model it as a classification task with a predefined set of known categories. To generailze beyond these preexisting classes, we define a new task of \textit{open intent discovery}. We investigate how intent can be generalized to those not seen during training. To this end, we propose a two-stage approach to this task - predicting whether an utterance contains an intent, and then tagging the intent in the input utterance. Our model consists of a bidirectional LSTM with a CRF on top to capture contextual semantics, subject to some constraints. Self-attention is used to learn long distance dependencies. Further, we adapt an adversarial training approach to improve robustness and perforamce across domains. We also present a dataset of 25k real-life utterances that have been labelled via crowd sourcing. Our experiments across different domains and real-world datasets show the effectiveness of our approach, with less than 100 annotated examples needed per unique domain to recognize diverse intents. The approach outperforms state-of-the-art baselines by 5-15% F1 score points. |
Tasks | |
Published | 2019-04-17 |
URL | http://arxiv.org/abs/1904.08524v1 |
http://arxiv.org/pdf/1904.08524v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-open-intent-discovery-for |
Repo | |
Framework | |
Deep Parametric Shape Predictions using Distance Fields
Title | Deep Parametric Shape Predictions using Distance Fields |
Authors | Dmitriy Smirnov, Matthew Fisher, Vladimir G. Kim, Richard Zhang, Justin Solomon |
Abstract | Many tasks in graphics and vision demand machinery for converting shapes into consistent representations with sparse sets of parameters; these representations facilitate rendering, editing, and storage. When the source data is noisy or ambiguous, however, artists and engineers often manually construct such representations, a tedious and potentially time-consuming process. While advances in deep learning have been successfully applied to noisy geometric data, the task of generating parametric shapes has so far been difficult for these methods. Hence, we propose a new framework for predicting parametric shape primitives using deep learning. We use distance fields to transition between shape parameters like control points and input data on a pixel grid. We demonstrate efficacy on 2D and 3D tasks, including font vectorization and surface abstraction. |
Tasks | |
Published | 2019-04-18 |
URL | https://arxiv.org/abs/1904.08921v2 |
https://arxiv.org/pdf/1904.08921v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-parametric-shape-predictions-using |
Repo | |
Framework | |
Cooperative Cross-Stream Network for Discriminative Action Representation
Title | Cooperative Cross-Stream Network for Discriminative Action Representation |
Authors | Jingran Zhang, Fumin Shen, Xing Xu, Heng Tao Shen |
Abstract | Spatial and temporal stream model has gained great success in video action recognition. Most existing works pay more attention to designing effective features fusion methods, which train the two-stream model in a separate way. However, it’s hard to ensure discriminability and explore complementary information between different streams in existing works. In this work, we propose a novel cooperative cross-stream network that investigates the conjoint information in multiple different modalities. The jointly spatial and temporal stream networks feature extraction is accomplished by an end-to-end learning manner. It extracts this complementary information of different modality from a connection block, which aims at exploring correlations of different stream features. Furthermore, different from the conventional ConvNet that learns the deep separable features with only one cross-entropy loss, our proposed model enhances the discriminative power of the deeply learned features and reduces the undesired modality discrepancy by jointly optimizing a modality ranking constraint and a cross-entropy loss for both homogeneous and heterogeneous modalities. The modality ranking constraint constitutes intra-modality discriminative embedding and inter-modality triplet constraint, and it reduces both the intra-modality and cross-modality feature variations. Experiments on three benchmark datasets demonstrate that by cooperating appearance and motion feature extraction, our method can achieve state-of-the-art or competitive performance compared with existing results. |
Tasks | Action Recognition In Videos, Temporal Action Localization |
Published | 2019-08-27 |
URL | https://arxiv.org/abs/1908.10136v1 |
https://arxiv.org/pdf/1908.10136v1.pdf | |
PWC | https://paperswithcode.com/paper/cooperative-cross-stream-network-for |
Repo | |
Framework | |
Improved Inference via Deep Input Transfer
Title | Improved Inference via Deep Input Transfer |
Authors | Saied Asgari Taghanaki, Kumar Abhishek, Ghassan Hamarneh |
Abstract | Although numerous improvements have been made in the field of image segmentation using convolutional neural networks, the majority of these improvements rely on training with larger datasets, model architecture modifications, novel loss functions, and better optimizers. In this paper, we propose a new segmentation performance boosting paradigm that relies on optimally modifying the network’s input instead of the network itself. In particular, we leverage the gradients of a trained segmentation network with respect to the input to transfer it to a space where the segmentation accuracy improves. We test the proposed method on three publicly available medical image segmentation datasets: the ISIC 2017 Skin Lesion Segmentation dataset, the Shenzhen Chest X-Ray dataset, and the CVC-ColonDB dataset, for which our method achieves improvements of 5.8%, 0.5%, and 4.8% in the average Dice scores, respectively. |
Tasks | Lesion Segmentation, Medical Image Segmentation, Semantic Segmentation |
Published | 2019-04-04 |
URL | https://arxiv.org/abs/1904.02307v4 |
https://arxiv.org/pdf/1904.02307v4.pdf | |
PWC | https://paperswithcode.com/paper/improved-inference-via-deep-input-transfer |
Repo | |
Framework | |
Scalable Graph Algorithms
Title | Scalable Graph Algorithms |
Authors | Christian Schulz |
Abstract | Processing large complex networks recently attracted considerable interest. Complex graphs are useful in a wide range of applications from technological networks to biological systems like the human brain. Sometimes these networks are composed of billions of entities that give rise to emerging properties and structures. Analyzing these structures aids us in gaining new insights about our surroundings. As huge networks become abundant, there is a need for scalable algorithms to perform analysis. A prominent example is the PageRank algorithm, which is one of the measures used by web search engines such as Google to rank web pages displayed to the user. In order to find these patterns, massive amounts of data have to be acquired and processed. Designing and evaluating scalable graph algorithms to handle these data sets is a crucial task on the road to understanding the underlying systems. This habilitation thesis is a summary a broad spectrum of scalable graph algorithms that I developed over the last six years with many coauthors. In general, this research is based on four pillars: multilevel algorithms, practical kernelization, parallelization and memetic algorithms that are highly interconnected. Experiments conducted indicate that our algorithms find better solutions and/or are much more scalable than the previous state-of-the-art. |
Tasks | |
Published | 2019-11-30 |
URL | https://arxiv.org/abs/1912.00245v1 |
https://arxiv.org/pdf/1912.00245v1.pdf | |
PWC | https://paperswithcode.com/paper/scalable-graph-algorithms |
Repo | |
Framework | |
A Mutual Bootstrapping Model for Automated Skin Lesion Segmentation and Classification
Title | A Mutual Bootstrapping Model for Automated Skin Lesion Segmentation and Classification |
Authors | Yutong Xie, Jianpeng Zhang, Yong Xia, Chunhua Shen |
Abstract | Automated skin lesion segmentation and classification are two most essential and related tasks in the computer-aided diagnosis of skin cancer. Despite their prevalence, deep learning models are usually designed for only one task, ignoring the potential benefits in jointly performing both tasks. In this paper, we propose the mutual bootstrapping deep convolutional neural networks (MB-DCNN) model for simultaneous skin lesion segmentation and classification. This model consists of a coarse segmentation network (coarse-SN), a mask-guided classification network (mask-CN), and an enhanced segmentation network (enhanced-SN). On one hand, the coarse-SN generates coarse lesion masks that provide a prior bootstrapping for mask-CN to help it locate and classify skin lesions accurately. On the other hand, the lesion localization maps produced by mask-CN are then fed into enhanced-SN, aiming to transfer the localization information learned by mask-CN to enhanced-SN for accurate lesion segmentation. In this way, both segmentation and classification networks mutually transfer knowledge between each other and facilitate each other in a bootstrapping way. Meanwhile, we also design a novel rank loss and jointly use it with the Dice loss in segmentation networks to address the issues caused by class imbalance and hard-easy pixel imbalance. We evaluate the proposed MB-DCNN model on the ISIC-2017 and PH2 datasets, and achieve a Jaccard index of 80.4% and 89.4% in skin lesion segmentation and an average AUC of 93.8% and 97.7% in skin lesion classification, which are superior to the performance of representative state-of-the-art skin lesion segmentation and classification methods. Our results suggest that it is possible to boost the performance of skin lesion segmentation and classification simultaneously via training a unified model to perform both tasks in a mutual bootstrapping way. |
Tasks | Lesion Segmentation, Skin Lesion Classification |
Published | 2019-03-08 |
URL | https://arxiv.org/abs/1903.03313v4 |
https://arxiv.org/pdf/1903.03313v4.pdf | |
PWC | https://paperswithcode.com/paper/semi-and-weakly-supervised-directional |
Repo | |
Framework | |
Deep adaptive dynamic programming for nonaffine nonlinear optimal control problem with state constraints
Title | Deep adaptive dynamic programming for nonaffine nonlinear optimal control problem with state constraints |
Authors | Jingliang Duan, Zhengyu Liu, Shengbo Eben Li, Qi Sun, Zhenzhong Jia, Bo Cheng |
Abstract | This paper presents a constrained deep adaptive dynamic programming (CDADP) algorithm to solve general nonlinear optimal control problems with known dynamics. Unlike previous ADP algorithms, it can directly deal with problems with state constraints. Both the policy and value function are approximated by deep neural networks (NNs), which directly map the system state to action and value function respectively without needing to use hand-crafted basis function. The proposed algorithm considers the state constraints by transforming the policy improvement process to a constrained optimization problem. Meanwhile, a trust region constraint is added to prevent excessive policy update. We first linearize this constrained optimization problem locally into a quadratically-constrained quadratic programming problem, and then obtain the optimal update of policy network parameters by solving its dual problem. We also propose a series of recovery rules to update the policy in case the primal problem is infeasible. In addition, parallel learners are employed to explore different state spaces and then stabilize and accelerate the learning speed. The vehicle control problem in path-tracking task is used to demonstrate the effectiveness of this proposed method. |
Tasks | |
Published | 2019-11-26 |
URL | https://arxiv.org/abs/1911.11397v1 |
https://arxiv.org/pdf/1911.11397v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-adaptive-dynamic-programming-for |
Repo | |
Framework | |
Visual Rationalizations in Deep Reinforcement Learning for Atari Games
Title | Visual Rationalizations in Deep Reinforcement Learning for Atari Games |
Authors | Laurens Weitkamp, Elise van der Pol, Zeynep Akata |
Abstract | Due to the capability of deep learning to perform well in high dimensional problems, deep reinforcement learning agents perform well in challenging tasks such as Atari 2600 games. However, clearly explaining why a certain action is taken by the agent can be as important as the decision itself. Deep reinforcement learning models, as other deep learning models, tend to be opaque in their decision-making process. In this work, we propose to make deep reinforcement learning more transparent by visualizing the evidence on which the agent bases its decision. In this work, we emphasize the importance of producing a justification for an observed action, which could be applied to a black-box decision agent. |
Tasks | Atari Games, Decision Making |
Published | 2019-02-01 |
URL | http://arxiv.org/abs/1902.00566v1 |
http://arxiv.org/pdf/1902.00566v1.pdf | |
PWC | https://paperswithcode.com/paper/visual-rationalizations-in-deep-reinforcement |
Repo | |
Framework | |
FocusNet: An attention-based Fully Convolutional Network for Medical Image Segmentation
Title | FocusNet: An attention-based Fully Convolutional Network for Medical Image Segmentation |
Authors | Chaitanya Kaul, Suresh Manandhar, Nick Pears |
Abstract | We propose a novel technique to incorporate attention within convolutional neural networks using feature maps generated by a separate convolutional autoencoder. Our attention architecture is well suited for incorporation with deep convolutional networks. We evaluate our model on benchmark segmentation datasets in skin cancer segmentation and lung lesion segmentation. Results show highly competitive performance when compared with U-Net and it’s residual variant. |
Tasks | Lesion Segmentation, Medical Image Segmentation, Semantic Segmentation, Skin Cancer Segmentation |
Published | 2019-02-08 |
URL | http://arxiv.org/abs/1902.03091v1 |
http://arxiv.org/pdf/1902.03091v1.pdf | |
PWC | https://paperswithcode.com/paper/focusnet-an-attention-based-fully |
Repo | |
Framework | |
Pose estimator and tracker using temporal flow maps for limbs
Title | Pose estimator and tracker using temporal flow maps for limbs |
Authors | Jihye Hwang, Jieun Lee, Sungheon Park, Nojun Kwak |
Abstract | For human pose estimation in videos, it is significant how to use temporal information between frames. In this paper, we propose temporal flow maps for limbs (TML) and a multi-stride method to estimate and track human poses. The proposed temporal flow maps are unit vectors describing the limbs’ movements. We constructed a network to learn both spatial information and temporal information end-to-end. Spatial information such as joint heatmaps and part affinity fields is regressed in the spatial network part, and the TML is regressed in the temporal network part. We also propose a data augmentation method to learn various types of TML better. The proposed multi-stride method expands the data by randomly selecting two frames within a defined range. We demonstrate that the proposed method efficiently estimates and tracks human poses on the PoseTrack 2017 and 2018 datasets. |
Tasks | Data Augmentation, Pose Estimation |
Published | 2019-05-23 |
URL | https://arxiv.org/abs/1905.09500v1 |
https://arxiv.org/pdf/1905.09500v1.pdf | |
PWC | https://paperswithcode.com/paper/pose-estimator-and-tracker-using-temporal |
Repo | |
Framework | |
WITCHcraft: Efficient PGD attacks with random step size
Title | WITCHcraft: Efficient PGD attacks with random step size |
Authors | Ping-Yeh Chiang, Jonas Geiping, Micah Goldblum, Tom Goldstein, Renkun Ni, Steven Reich, Ali Shafahi |
Abstract | State-of-the-art adversarial attacks on neural networks use expensive iterative methods and numerous random restarts from different initial points. Iterative FGSM-based methods without restarts trade off performance for computational efficiency because they do not adequately explore the image space and are highly sensitive to the choice of step size. We propose a variant of Projected Gradient Descent (PGD) that uses a random step size to improve performance without resorting to expensive random restarts. Our method, Wide Iterative Stochastic crafting (WITCHcraft), achieves results superior to the classical PGD attack on the CIFAR-10 and MNIST data sets but without additional computational cost. This simple modification of PGD makes crafting attacks more economical, which is important in situations like adversarial training where attacks need to be crafted in real time. |
Tasks | |
Published | 2019-11-18 |
URL | https://arxiv.org/abs/1911.07989v1 |
https://arxiv.org/pdf/1911.07989v1.pdf | |
PWC | https://paperswithcode.com/paper/witchcraft-efficient-pgd-attacks-with-random |
Repo | |
Framework | |
Soft labeling by Distilling Anatomical knowledge for Improved MS Lesion Segmentation
Title | Soft labeling by Distilling Anatomical knowledge for Improved MS Lesion Segmentation |
Authors | Eytan Kats, Jacob Goldberger, Hayit Greenspan |
Abstract | This paper explores the use of a soft ground-truth mask (“soft mask’') to train a Fully Convolutional Neural Network (FCNN) for segmentation of Multiple Sclerosis (MS) lesions. Detection and segmentation of MS lesions is a complex task largely due to the extreme unbalanced data, with very small number of lesion pixels that can be used for training. Utilizing the anatomical knowledge that the lesion surrounding pixels may also include some lesion level information, we suggest to increase the data set of the lesion class with neighboring pixel data - with a reduced confidence weight. A soft mask is constructed by morphological dilation of the binary segmentation mask provided by a given expert, where expert-marked voxels receive label 1 and voxels of the dilated region are assigned a soft label. In the methodology proposed, the FCNN is trained using the soft mask. On the ISBI 2015 challenge dataset, this is shown to provide a better precision-recall tradeoff and to achieve a higher average Dice similarity coefficient. We also show that by using this soft mask scheme we can improve the network segmentation performance when compared to a second independent expert. |
Tasks | Lesion Segmentation |
Published | 2019-01-26 |
URL | http://arxiv.org/abs/1901.09263v1 |
http://arxiv.org/pdf/1901.09263v1.pdf | |
PWC | https://paperswithcode.com/paper/soft-labeling-by-distilling-anatomical |
Repo | |
Framework | |
SIMMC: Situated Interactive Multi-Modal Conversational Data Collection And Evaluation Platform
Title | SIMMC: Situated Interactive Multi-Modal Conversational Data Collection And Evaluation Platform |
Authors | Paul A. Crook, Shivani Poddar, Ankita De, Semir Shafi, David Whitney, Alborz Geramifard, Rajen Subba |
Abstract | As digital virtual assistants become ubiquitous, it becomes increasingly important to understand the situated behaviour of users as they interact with these assistants. To this end, we introduce SIMMC, an extension to ParlAI for multi-modal conversational data collection and system evaluation. SIMMC simulates an immersive setup, where crowd workers are able to interact with environments constructed in AI Habitat or Unity while engaging in a conversation. The assistant in SIMMC can be a crowd worker or Artificial Intelligent (AI) agent. This enables both (i) a multi-player / Wizard of Oz setting for data collection, or (ii) a single player mode for model / system evaluation. We plan to open-source a situated conversational data-set collected on this platform for the Conversational AI research community. |
Tasks | |
Published | 2019-11-07 |
URL | https://arxiv.org/abs/1911.02690v2 |
https://arxiv.org/pdf/1911.02690v2.pdf | |
PWC | https://paperswithcode.com/paper/simmc-situated-interactive-multi-modal |
Repo | |
Framework | |
Deep Convolutional Encoder-Decoders with Aggregated Multi-Resolution Skip Connections for Skin Lesion Segmentation
Title | Deep Convolutional Encoder-Decoders with Aggregated Multi-Resolution Skip Connections for Skin Lesion Segmentation |
Authors | Ahmed H. Shahin, Karim Amer, Mustafa A. Elattar |
Abstract | The prevalence of skin melanoma is rapidly increasing as well as the recorded death cases of its patients. Automatic image segmentation tools play an important role in providing standardized computer-assisted analysis for skin melanoma patients. Current state-of-the-art segmentation methods are based on fully convolutional neural networks, which utilize an encoder-decoder approach. However, these methods produce coarse segmentation masks due to the loss of location information during the encoding layers. Inspired by Pyramid Scene Parsing Network (PSP-Net), we propose an encoder-decoder model that utilizes pyramid pooling modules in the deep skip connections which aggregate the global context and compensate for the lost spatial information. We trained and validated our approach using ISIC 2018: Skin Lesion Analysis Towards Melanoma Detection grand challenge dataset. Our approach showed a validation accuracy with a Jaccard index of 0.837, which outperforms U-Net. We believe that with this reported reliable accuracy, this method can be introduced for clinical practice. |
Tasks | Lesion Segmentation, Scene Parsing, Semantic Segmentation |
Published | 2019-01-26 |
URL | http://arxiv.org/abs/1901.09197v2 |
http://arxiv.org/pdf/1901.09197v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-convolutional-encoder-decoders-with |
Repo | |
Framework | |
Multi-Task Learning for Argumentation Mining
Title | Multi-Task Learning for Argumentation Mining |
Authors | Tobias Kahse |
Abstract | Multi-task learning has recently become a very active field in deep learning research. In contrast to learning a single task in isolation, multiple tasks are learned at the same time, thereby utilizing the training signal of related tasks to improve the performance on the respective machine learning tasks. Related work shows various successes in different domains when applying this paradigm and this thesis extends the existing empirical results by evaluating multi-task learning in four different scenarios: argumentation mining, epistemic segmentation, argumentation component segmentation, and grapheme-to-phoneme conversion. We show that multi-task learning can, indeed, improve the performance compared to single-task learning in all these scenarios, but may also hurt the performance. Therefore, we investigate the reasons for successful and less successful applications of this paradigm and find that dataset properties such as entropy or the size of the label inventory are good indicators for a potential multi-task learning success and that multi-task learning is particularly useful if the task at hand suffers from data sparsity, i.e. a lack of training data. Moreover, multi-task learning is particularly effective for long input sequences in our experiments. We have observed this trend in all evaluated scenarios. Finally, we develop a highly configurable and extensible sequence tagging framework which supports multi-task learning to conduct our empirical experiments and to aid future research regarding the multi-task learning paradigm and natural language processing. |
Tasks | Multi-Task Learning |
Published | 2019-04-23 |
URL | http://arxiv.org/abs/1904.10162v1 |
http://arxiv.org/pdf/1904.10162v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-task-learning-for-argumentation-mining-2 |
Repo | |
Framework | |