October 16, 2019

3259 words 16 mins read

Paper Group ANR 982

Features, Projections, and Representation Change for Generalized Planning. Distributed Self-Paced Learning in Alternating Direction Method of Multipliers. 3D-DETNet: a Single Stage Video-Based Vehicle Detector. Memory-augmented Dialogue Management for Task-oriented Dialogue Systems. Evaluating Merging Strategies for Sampling-based Uncertainty Techn …

Features, Projections, and Representation Change for Generalized Planning


Title	Features, Projections, and Representation Change for Generalized Planning
Authors	Blai Bonet, Hector Geffner
Abstract	Generalized planning is concerned with the characterization and computation of plans that solve many instances at once. In the standard formulation, a generalized plan is a mapping from feature or observation histories into actions, assuming that the instances share a common pool of features and actions. This assumption, however, excludes the standard relational planning domains where actions and objects change across instances. In this work, we extend the standard formulation of generalized planning to such domains. This is achieved by projecting the actions over the features, resulting in a common set of abstract actions which can be tested for soundness and completeness, and which can be used for generating general policies such as “if the gripper is empty, pick the clear block above x and place it on the table” that achieve the goal clear(x) in any Blocksworld instance. In this policy, “pick the clear block above x” is an abstract action that may represent the action Unstack(a, b) in one situation and the action Unstack(b, c) in another. Transformations are also introduced for computing such policies by means of fully observable non-deterministic (FOND) planners. The value of generalized representations for learning general policies is also discussed.
Tasks
Published	2018-01-30
URL	http://arxiv.org/abs/1801.10055v4
PDF	http://arxiv.org/pdf/1801.10055v4.pdf
PWC	https://paperswithcode.com/paper/features-projections-and-representation
Repo
Framework

Distributed Self-Paced Learning in Alternating Direction Method of Multipliers


Title	Distributed Self-Paced Learning in Alternating Direction Method of Multipliers
Authors	Xuchao Zhang, Liang Zhao, Zhiqian Chen, Chang-Tien Lu
Abstract	Self-paced learning (SPL) mimics the cognitive process of humans, who generally learn from easy samples to hard ones. One key issue in SPL is the training process required for each instance weight depends on the other samples and thus cannot easily be run in a distributed manner in a large-scale dataset. In this paper, we reformulate the self-paced learning problem into a distributed setting and propose a novel Distributed Self-Paced Learning method (DSPL) to handle large-scale datasets. Specifically, both the model and instance weights can be optimized in parallel for each batch based on a consensus alternating direction method of multipliers. We also prove the convergence of our algorithm under mild conditions. Extensive experiments on both synthetic and real datasets demonstrate that our approach is superior to those of existing methods.
Tasks
Published	2018-07-06
URL	http://arxiv.org/abs/1807.02234v1
PDF	http://arxiv.org/pdf/1807.02234v1.pdf
PWC	https://paperswithcode.com/paper/distributed-self-paced-learning-in
Repo
Framework

3D-DETNet: a Single Stage Video-Based Vehicle Detector


Title	3D-DETNet: a Single Stage Video-Based Vehicle Detector
Authors	Suichan Li
Abstract	Video-based vehicle detection has received considerable attention over the last ten years and there are many deep learning based detection methods which can be applied to it. However, these methods are devised for still images and applying them for video vehicle detection directly always obtains poor performance. In this work, we propose a new single-stage video-based vehicle detector integrated with 3DCovNet and focal loss, called 3D-DETNet. Draw support from 3D Convolution network and focal loss, our method has ability to capture motion information and is more suitable to detect vehicle in video than other single-stage methods devised for static images. The multiple video frames are initially fed to 3D-DETNet to generate multiple spatial feature maps, then sub-model 3DConvNet takes spatial feature maps as input to capture temporal information which is fed to final fully convolution model for predicting locations of vehicles in video frames. We evaluate our method on UA-DETAC vehicle detection dataset and our 3D-DETNet yields best performance and keeps a higher detection speed of 26 fps compared with other competing methods.
Tasks
Published	2018-01-05
URL	http://arxiv.org/abs/1801.01769v2
PDF	http://arxiv.org/pdf/1801.01769v2.pdf
PWC	https://paperswithcode.com/paper/3d-detnet-a-single-stage-video-based-vehicle
Repo
Framework

Memory-augmented Dialogue Management for Task-oriented Dialogue Systems


Title	Memory-augmented Dialogue Management for Task-oriented Dialogue Systems
Authors	Zheng Zhang, Minlie Huang, Zhongzhou Zhao, Feng Ji, Haiqing Chen, Xiaoyan Zhu
Abstract	Dialogue management (DM) decides the next action of a dialogue system according to the current dialogue state, and thus plays a central role in task-oriented dialogue systems. Since dialogue management requires to have access to not only local utterances, but also the global semantics of the entire dialogue session, modeling the long-range history information is a critical issue. To this end, we propose a novel Memory-Augmented Dialogue management model (MAD) which employs a memory controller and two additional memory structures, i.e., a slot-value memory and an external memory. The slot-value memory tracks the dialogue state by memorizing and updating the values of semantic slots (for instance, cuisine, price, and location), and the external memory augments the representation of hidden states of traditional recurrent neural networks through storing more context information. To update the dialogue state efficiently, we also propose slot-level attention on user utterances to extract specific semantic information for each slot. Experiments show that our model can obtain state-of-the-art performance and outperforms existing baselines.
Tasks	Dialogue Management, Task-Oriented Dialogue Systems
Published	2018-05-01
URL	http://arxiv.org/abs/1805.00150v1
PDF	http://arxiv.org/pdf/1805.00150v1.pdf
PWC	https://paperswithcode.com/paper/memory-augmented-dialogue-management-for-task
Repo
Framework

Evaluating Merging Strategies for Sampling-based Uncertainty Techniques in Object Detection


Title	Evaluating Merging Strategies for Sampling-based Uncertainty Techniques in Object Detection
Authors	Dimity Miller, Feras Dayoub, Michael Milford, Niko Sünderhauf
Abstract	There has been a recent emergence of sampling-based techniques for estimating epistemic uncertainty in deep neural networks. While these methods can be applied to classification or semantic segmentation tasks by simply averaging samples, this is not the case for object detection, where detection sample bounding boxes must be accurately associated and merged. A weak merging strategy can significantly degrade the performance of the detector and yield an unreliable uncertainty measure. This paper provides the first in-depth investigation of the effect of different association and merging strategies. We compare different combinations of three spatial and two semantic affinity measures with four clustering methods for MC Dropout with a Single Shot Multi-Box Detector. Our results show that the correct choice of affinity-clustering combination can greatly improve the effectiveness of the classification and spatial uncertainty estimation and the resulting object detection performance. We base our evaluation on a new mix of datasets that emulate near open-set conditions (semantically similar unknown classes), distant open-set conditions (semantically dissimilar unknown classes) and the common closed-set conditions (only known classes).
Tasks	Object Detection, Semantic Segmentation
Published	2018-09-17
URL	http://arxiv.org/abs/1809.06006v3
PDF	http://arxiv.org/pdf/1809.06006v3.pdf
PWC	https://paperswithcode.com/paper/evaluating-merging-strategies-for-sampling
Repo
Framework

Liver segmentation in CT images using three dimensional to two dimensional fully convolutional network


Title	Liver segmentation in CT images using three dimensional to two dimensional fully convolutional network
Authors	Shima Rafiei, Ebrahim Nasr-Esfahani, S. M. Reza Soroushmehr, Nader Karimi, Shadrokh Samavi, Kayvan Najarian
Abstract	The need for CT scan analysis is growing for pre-diagnosis and therapy of abdominal organs. Automatic organ segmentation of abdominal CT scan can help radiologists analyze the scans faster and segment organ images with fewer errors. However, existing methods are not efficient enough to perform the segmentation process for victims of accidents and emergencies situations. In this paper we propose an efficient liver segmentation with our 3D to 2D fully connected network (3D-2D-FCN). The segmented mask is enhanced by means of conditional random field on the organ’s border. Consequently, we segment a target liver in less than a minute with Dice score of 93.52.
Tasks	Liver Segmentation
Published	2018-02-21
URL	http://arxiv.org/abs/1802.07800v2
PDF	http://arxiv.org/pdf/1802.07800v2.pdf
PWC	https://paperswithcode.com/paper/liver-segmentation-in-ct-images-using-three
Repo
Framework

Abstractive Summarization Improved by WordNet-based Extractive Sentences


Title	Abstractive Summarization Improved by WordNet-based Extractive Sentences
Authors	Niantao Xie, Sujian Li, Huiling Ren, Qibin Zhai
Abstract	Recently, the seq2seq abstractive summarization models have achieved good results on the CNN/Daily Mail dataset. Still, how to improve abstractive methods with extractive methods is a good research direction, since extractive methods have their potentials of exploiting various efficient features for extracting important sentences in one text. In this paper, in order to improve the semantic relevance of abstractive summaries, we adopt the WordNet based sentence ranking algorithm to extract the sentences which are most semantically to one text. Then, we design a dual attentional seq2seq framework to generate summaries with consideration of the extracted information. At the same time, we combine pointer-generator and coverage mechanisms to solve the problems of out-of-vocabulary (OOV) words and duplicate words which exist in the abstractive models. Experiments on the CNN/Daily Mail dataset show that our models achieve competitive performance with the state-of-the-art ROUGE scores. Human evaluations also show that the summaries generated by our models have high semantic relevance to the original text.
Tasks	Abstractive Text Summarization
Published	2018-08-04
URL	http://arxiv.org/abs/1808.01426v1
PDF	http://arxiv.org/pdf/1808.01426v1.pdf
PWC	https://paperswithcode.com/paper/abstractive-summarization-improved-by-wordnet
Repo
Framework

Unsupervised and semi-supervised learning with Categorical Generative Adversarial Networks assisted by Wasserstein distance for dermoscopy image Classification


Title	Unsupervised and semi-supervised learning with Categorical Generative Adversarial Networks assisted by Wasserstein distance for dermoscopy image Classification
Authors	Xin Yi, Ekta Walia, Paul Babyn
Abstract	Melanoma is a curable aggressive skin cancer if detected early. Typically, the diagnosis involves initial screening with subsequent biopsy and histopathological examination if necessary. Computer aided diagnosis offers an objective score that is independent of clinical experience and the potential to lower the workload of a dermatologist. In the recent past, success of deep learning algorithms in the field of general computer vision has motivated successful application of supervised deep learning methods in computer aided melanoma recognition. However, large quantities of labeled images are required to make further improvements on the supervised method. A good annotation generally requires clinical and histological confirmation, which requires significant effort. In an attempt to alleviate this constraint, we propose to use categorical generative adversarial network to automatically learn the feature representation of dermoscopy images in an unsupervised and semi-supervised manner. Thorough experiments on ISIC 2016 skin lesion chal- lenge demonstrate that the proposed feature learning method has achieved an average precision score of 0.424 with only 140 labeled images. Moreover, the proposed method is also capable of generating real-world like dermoscopy images.
Tasks	Image Classification
Published	2018-04-10
URL	http://arxiv.org/abs/1804.03700v1
PDF	http://arxiv.org/pdf/1804.03700v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-and-semi-supervised-learning-1
Repo
Framework

Mining Interpretable AOG Representations from Convolutional Networks via Active Question Answering


Title	Mining Interpretable AOG Representations from Convolutional Networks via Active Question Answering
Authors	Quanshi Zhang, Ruiming Cao, Ying Nian Wu, Song-Chun Zhu
Abstract	In this paper, we present a method to mine object-part patterns from conv-layers of a pre-trained convolutional neural network (CNN). The mined object-part patterns are organized by an And-Or graph (AOG). This interpretable AOG representation consists of a four-layer semantic hierarchy, i.e., semantic parts, part templates, latent patterns, and neural units. The AOG associates each object part with certain neural units in feature maps of conv-layers. The AOG is constructed in a weakly-supervised manner, i.e., very few annotations (e.g., 3-20) of object parts are used to guide the learning of AOGs. We develop a question-answering (QA) method that uses active human-computer communications to mine patterns from a pre-trained CNN, in order to incrementally explain more features in conv-layers. During the learning process, our QA method uses the current AOG for part localization. The QA method actively identifies objects, whose feature maps cannot be explained by the AOG. Then, our method asks people to annotate parts on the unexplained objects, and uses answers to discover CNN patterns corresponding to the newly labeled parts. In this way, our method gradually grows new branches and refines existing branches on the AOG to semanticize CNN representations. In experiments, our method exhibited a high learning efficiency. Our method used about 1/6-1/3 of the part annotations for training, but achieved similar or better part-localization performance than fast-RCNN methods.
Tasks	Question Answering
Published	2018-12-18
URL	http://arxiv.org/abs/1812.07996v1
PDF	http://arxiv.org/pdf/1812.07996v1.pdf
PWC	https://paperswithcode.com/paper/mining-interpretable-aog-representations-from
Repo
Framework

Efficient Online Multi-Person 2D Pose Tracking with Recurrent Spatio-Temporal Affinity Fields


Title	Efficient Online Multi-Person 2D Pose Tracking with Recurrent Spatio-Temporal Affinity Fields
Authors	Yaadhav Raaj, Haroon Idrees, Gines Hidalgo, Yaser Sheikh
Abstract	We present an online approach to efficiently and simultaneously detect and track the 2D pose of multiple people in a video sequence. We build upon Part Affinity Field (PAF) representation designed for static images, and propose an architecture that can encode and predict Spatio-Temporal Affinity Fields (STAF) across a video sequence. In particular, we propose a novel temporal topology cross-linked across limbs which can consistently handle body motions of a wide range of magnitudes. Additionally, we make the overall approach recurrent in nature, where the network ingests STAF heatmaps from previous frames and estimates those for the current frame. Our approach uses only online inference and tracking, and is currently the fastest and the most accurate bottom-up approach that is runtime invariant to the number of people in the scene and accuracy invariant to input frame rate of camera. Running at $\sim$30 fps on a single GPU at single scale, it achieves highly competitive results on the PoseTrack benchmarks.
Tasks	Pose Tracking
Published	2018-11-29
URL	https://arxiv.org/abs/1811.11975v3
PDF	https://arxiv.org/pdf/1811.11975v3.pdf
PWC	https://paperswithcode.com/paper/efficient-online-multi-person-2d-pose
Repo
Framework

A Counter-Forensic Method for CNN-Based Camera Model Identification


Title	A Counter-Forensic Method for CNN-Based Camera Model Identification
Authors	David Güera, Yu Wang, Luca Bondi, Paolo Bestagini, Stefano Tubaro, Edward J. Delp
Abstract	An increasing number of digital images are being shared and accessed through websites, media, and social applications. Many of these images have been modified and are not authentic. Recent advances in the use of deep convolutional neural networks (CNNs) have facilitated the task of analyzing the veracity and authenticity of largely distributed image datasets. We examine in this paper the problem of identifying the camera model or type that was used to take an image and that can be spoofed. Due to the linear nature of CNNs and the high-dimensionality of images, neural networks are vulnerable to attacks with adversarial examples. These examples are imperceptibly different from correctly classified images but are misclassified with high confidence by CNNs. In this paper, we describe a counter-forensic method capable of subtly altering images to change their estimated camera model when they are analyzed by any CNN-based camera model detector. Our method can use both the Fast Gradient Sign Method (FGSM) or the Jacobian-based Saliency Map Attack (JSMA) to craft these adversarial images and does not require direct access to the CNN. Our results show that even advanced deep learning architectures trained to analyze images and obtain camera model information are still vulnerable to our proposed method.
Tasks
Published	2018-05-06
URL	http://arxiv.org/abs/1805.02131v1
PDF	http://arxiv.org/pdf/1805.02131v1.pdf
PWC	https://paperswithcode.com/paper/a-counter-forensic-method-for-cnn-based
Repo
Framework

Open-World Stereo Video Matching with Deep RNN


Title	Open-World Stereo Video Matching with Deep RNN
Authors	Yiran Zhong, Hongdong Li, Yuchao Dai
Abstract	Deep Learning based stereo matching methods have shown great successes and achieved top scores across different benchmarks. However, like most data-driven methods, existing deep stereo matching networks suffer from some well-known drawbacks such as requiring large amount of labeled training data, and that their performances are fundamentally limited by the generalization ability. In this paper, we propose a novel Recurrent Neural Network (RNN) that takes a continuous (possibly previously unseen) stereo video as input, and directly predicts a depth-map at each frame without a pre-training process, and without the need of ground-truth depth-maps as supervision. Thanks to the recurrent nature (provided by two convolutional-LSTM blocks), our network is able to memorize and learn from its past experiences, and modify its inner parameters (network weights) to adapt to previously unseen or unfamiliar environments. This suggests a remarkable generalization ability of the net, making it applicable in an {\em open world} setting. Our method works robustly with changes in scene content, image statistics, and lighting and season conditions {\em etc}. By extensive experiments, we demonstrate that the proposed method seamlessly adapts between different scenarios. Equally important, in terms of the stereo matching accuracy, it outperforms state-of-the-art deep stereo approaches on standard benchmark datasets such as KITTI and Middlebury stereo.
Tasks	Stereo Matching, Stereo Matching Hand
Published	2018-08-12
URL	http://arxiv.org/abs/1808.03959v1
PDF	http://arxiv.org/pdf/1808.03959v1.pdf
PWC	https://paperswithcode.com/paper/open-world-stereo-video-matching-with-deep
Repo
Framework

Fairness Behind a Veil of Ignorance: A Welfare Analysis for Automated Decision Making


Title	Fairness Behind a Veil of Ignorance: A Welfare Analysis for Automated Decision Making
Authors	Hoda Heidari, Claudio Ferrari, Krishna P. Gummadi, Andreas Krause
Abstract	We draw attention to an important, yet largely overlooked aspect of evaluating fairness for automated decision making systems—namely risk and welfare considerations. Our proposed family of measures corresponds to the long-established formulations of cardinal social welfare in economics, and is justified by the Rawlsian conception of fairness behind a veil of ignorance. The convex formulation of our welfare-based measures of fairness allows us to integrate them as a constraint into any convex loss minimization pipeline. Our empirical analysis reveals interesting trade-offs between our proposal and (a) prediction accuracy, (b) group discrimination, and (c) Dwork et al.‘s notion of individual fairness. Furthermore and perhaps most importantly, our work provides both heuristic justification and empirical evidence suggesting that a lower-bound on our measures often leads to bounded inequality in algorithmic outcomes; hence presenting the first computationally feasible mechanism for bounding individual-level inequality.
Tasks	Decision Making
Published	2018-06-13
URL	http://arxiv.org/abs/1806.04959v4
PDF	http://arxiv.org/pdf/1806.04959v4.pdf
PWC	https://paperswithcode.com/paper/fairness-behind-a-veil-of-ignorance-a-welfare
Repo
Framework

Flow Shape Design for Microfluidic Devices Using Deep Reinforcement Learning


Title	Flow Shape Design for Microfluidic Devices Using Deep Reinforcement Learning
Authors	Xian Yeow Lee, Aditya Balu, Daniel Stoecklein, Baskar Ganapathysubramanian, Soumik Sarkar
Abstract	Microfluidic devices are utilized to control and direct flow behavior in a wide variety of applications, particularly in medical diagnostics. A particularly popular form of microfluidics – called inertial microfluidic flow sculpting – involves placing a sequence of pillars to controllably deform an initial flow field into a desired one. Inertial flow sculpting can be formally defined as an inverse problem, where one identifies a sequence of pillars (chosen, with replacement, from a finite set of pillars, each of which produce a specific transformation) whose composite transformation results in a user-defined desired transformation. Endemic to most such problems in engineering, inverse problems are usually quite computationally intractable, with most traditional approaches based on search and optimization strategies. In this paper, we pose this inverse problem as a Reinforcement Learning (RL) problem. We train a DoubleDQN agent to learn from this environment. The results suggest that learning is possible using a DoubleDQN model with the success frequency reaching 90% in 200,000 episodes and the rewards converging. While most of the results are obtained by fixing a particular target flow shape to simplify the learning problem, we later demonstrate how to transfer the learning of an agent based on one target shape to another, i.e. from one design to another and thus be useful for a generic design of a flow shape.
Tasks
Published	2018-11-29
URL	http://arxiv.org/abs/1811.12444v1
PDF	http://arxiv.org/pdf/1811.12444v1.pdf
PWC	https://paperswithcode.com/paper/flow-shape-design-for-microfluidic-devices
Repo
Framework

Unsupervised Object-Level Video Summarization with Online Motion Auto-Encoder


Title	Unsupervised Object-Level Video Summarization with Online Motion Auto-Encoder
Authors	Yujia Zhang, Xiaodan Liang, Dingwen Zhang, Min Tan, Eric P. Xing
Abstract	Unsupervised video summarization plays an important role on digesting, browsing, and searching the ever-growing videos every day, and the underlying fine-grained semantic and motion information (i.e., objects of interest and their key motions) in online videos has been barely touched. In this paper, we investigate a pioneer research direction towards the fine-grained unsupervised object-level video summarization. It can be distinguished from existing pipelines in two aspects: extracting key motions of participated objects, and learning to summarize in an unsupervised and online manner. To achieve this goal, we propose a novel online motion Auto-Encoder (online motion-AE) framework that functions on the super-segmented object motion clips. Comprehensive experiments on a newly-collected surveillance dataset and public datasets have demonstrated the effectiveness of our proposed method.
Tasks	Unsupervised Video Summarization, Video Summarization
Published	2018-01-02
URL	http://arxiv.org/abs/1801.00543v2
PDF	http://arxiv.org/pdf/1801.00543v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-object-level-video-summarization
Repo
Framework