Paper Group ANR 481
Top-down Transformation Choice. MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features. Attention-based Information Fusion using Multi-Encoder-Decoder Recurrent Neural Networks. Dynamics Transfer GAN: Generating Video by Transferring Arbitrary Temporal Dynamics from a Source Video to a Single Target Image. …
Top-down Transformation Choice
Title | Top-down Transformation Choice |
Authors | Torsten Hothorn |
Abstract | Simple models are preferred over complex models, but over-simplistic models could lead to erroneous interpretations. The classical approach is to start with a simple model, whose shortcomings are assessed in residual-based model diagnostics. Eventually, one increases the complexity of this initial overly simple model and obtains a better-fitting model. I illustrate how transformation analysis can be used as an alternative approach to model choice. Instead of adding complexity to simple models, step-wise complexity reduction is used to help identify simpler and better-interpretable models. As an example, body mass index distributions in Switzerland are modelled by means of transformation models to understand the impact of sex, age, smoking and other lifestyle factors on a person’s body mass index. In this process, I searched for a compromise between model fit and model interpretability. Special emphasis is given to the understanding of the connections between transformation models of increasing complexity. The models used in this analysis ranged from evergreens, such as the normal linear regression model with constant variance, to novel models with extremely flexible conditional distribution functions, such as transformation trees and transformation forests. |
Tasks | |
Published | 2017-06-26 |
URL | http://arxiv.org/abs/1706.08269v2 |
http://arxiv.org/pdf/1706.08269v2.pdf | |
PWC | https://paperswithcode.com/paper/top-down-transformation-choice |
Repo | |
Framework | |
MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features
Title | MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features |
Authors | Liang-Chieh Chen, Alexander Hermans, George Papandreou, Florian Schroff, Peng Wang, Hartwig Adam |
Abstract | In this work, we tackle the problem of instance segmentation, the task of simultaneously solving object detection and semantic segmentation. Towards this goal, we present a model, called MaskLab, which produces three outputs: box detection, semantic segmentation, and direction prediction. Building on top of the Faster-RCNN object detector, the predicted boxes provide accurate localization of object instances. Within each region of interest, MaskLab performs foreground/background segmentation by combining semantic and direction prediction. Semantic segmentation assists the model in distinguishing between objects of different semantic classes including background, while the direction prediction, estimating each pixel’s direction towards its corresponding center, allows separating instances of the same semantic class. Moreover, we explore the effect of incorporating recent successful methods from both segmentation and detection (i.e. atrous convolution and hypercolumn). Our proposed model is evaluated on the COCO instance segmentation benchmark and shows comparable performance with other state-of-art models. |
Tasks | Instance Segmentation, Object Detection, Semantic Segmentation |
Published | 2017-12-13 |
URL | http://arxiv.org/abs/1712.04837v1 |
http://arxiv.org/pdf/1712.04837v1.pdf | |
PWC | https://paperswithcode.com/paper/masklab-instance-segmentation-by-refining |
Repo | |
Framework | |
Attention-based Information Fusion using Multi-Encoder-Decoder Recurrent Neural Networks
Title | Attention-based Information Fusion using Multi-Encoder-Decoder Recurrent Neural Networks |
Authors | Stephan Baier, Sigurd Spieckermann, Volker Tresp |
Abstract | With the rising number of interconnected devices and sensors, modeling distributed sensor networks is of increasing interest. Recurrent neural networks (RNN) are considered particularly well suited for modeling sensory and streaming data. When predicting future behavior, incorporating information from neighboring sensor stations is often beneficial. We propose a new RNN based architecture for context specific information fusion across multiple spatially distributed sensor stations. Hereby, latent representations of multiple local models, each modeling one sensor station, are jointed and weighted, according to their importance for the prediction. The particular importance is assessed depending on the current context using a separate attention function. We demonstrate the effectiveness of our model on three different real-world sensor network datasets. |
Tasks | |
Published | 2017-11-13 |
URL | http://arxiv.org/abs/1711.04679v1 |
http://arxiv.org/pdf/1711.04679v1.pdf | |
PWC | https://paperswithcode.com/paper/attention-based-information-fusion-using |
Repo | |
Framework | |
Dynamics Transfer GAN: Generating Video by Transferring Arbitrary Temporal Dynamics from a Source Video to a Single Target Image
Title | Dynamics Transfer GAN: Generating Video by Transferring Arbitrary Temporal Dynamics from a Source Video to a Single Target Image |
Authors | Wissam J. Baddar, Geonmo Gu, Sangmin Lee, Yong Man Ro |
Abstract | In this paper, we propose Dynamics Transfer GAN; a new method for generating video sequences based on generative adversarial learning. The spatial constructs of a generated video sequence are acquired from the target image. The dynamics of the generated video sequence are imported from a source video sequence, with arbitrary motion, and imposed onto the target image. To preserve the spatial construct of the target image, the appearance of the source video sequence is suppressed and only the dynamics are obtained before being imposed onto the target image. That is achieved using the proposed appearance suppressed dynamics feature. Moreover, the spatial and temporal consistencies of the generated video sequence are verified via two discriminator networks. One discriminator validates the fidelity of the generated frames appearance, while the other validates the dynamic consistency of the generated video sequence. Experiments have been conducted to verify the quality of the video sequences generated by the proposed method. The results verified that Dynamics Transfer GAN successfully transferred arbitrary dynamics of the source video sequence onto a target image when generating the output video sequence. The experimental results also showed that Dynamics Transfer GAN maintained the spatial constructs (appearance) of the target image while generating spatially and temporally consistent video sequences. |
Tasks | |
Published | 2017-12-10 |
URL | http://arxiv.org/abs/1712.03534v1 |
http://arxiv.org/pdf/1712.03534v1.pdf | |
PWC | https://paperswithcode.com/paper/dynamics-transfer-gan-generating-video-by |
Repo | |
Framework | |
Acquisition of Translation Lexicons for Historically Unwritten Languages via Bridging Loanwords
Title | Acquisition of Translation Lexicons for Historically Unwritten Languages via Bridging Loanwords |
Authors | Michael Bloodgood, Benjamin Strauss |
Abstract | With the advent of informal electronic communications such as social media, colloquial languages that were historically unwritten are being written for the first time in heavily code-switched environments. We present a method for inducing portions of translation lexicons through the use of expert knowledge in these settings where there are approximately zero resources available other than a language informant, potentially not even large amounts of monolingual data. We investigate inducing a Moroccan Darija-English translation lexicon via French loanwords bridging into English and find that a useful lexicon is induced for human-assisted translation and statistical machine translation. |
Tasks | Machine Translation |
Published | 2017-06-06 |
URL | http://arxiv.org/abs/1706.01570v2 |
http://arxiv.org/pdf/1706.01570v2.pdf | |
PWC | https://paperswithcode.com/paper/acquisition-of-translation-lexicons-for |
Repo | |
Framework | |
Fast Barcode Retrieval for Consensus Contouring
Title | Fast Barcode Retrieval for Consensus Contouring |
Authors | H. R. Tizhoosh, G. J. Czarnota |
Abstract | Marking tumors and organs is a challenging task suffering from both inter- and intra-observer variability. The literature quantifies observer variability by generating consensus among multiple experts when they mark the same image. Automatically building consensus contours to establish quality assurance for image segmentation is presently absent in the clinical practice. As the \emph{big data} becomes more and more available, techniques to access a large number of existing segments of multiple experts becomes possible. Fast algorithms are, hence, required to facilitate the search for similar cases. The present work puts forward a potential framework that tested with small datasets (both synthetic and real images) displays the reliability of finding similar images. In this paper, the idea of content-based barcodes is used to retrieve similar cases in order to build consensus contours in medical image segmentation. This approach may be regarded as an extension of the conventional atlas-based segmentation that generally works with rather small atlases due to required computational expenses. The fast segment-retrieval process via barcodes makes it possible to create and use large atlases, something that directly contributes to the quality of the consensus building. Because the accuracy of experts’ contours must be measured, we first used 500 synthetic prostate images with their gold markers and delineations by 20 simulated users. The fast barcode-guided computed consensus delivered an average error of $8%!\pm!5%$ compared against the gold standard segments. Furthermore, we used magnetic resonance images of prostates from 15 patients delineated by 5 oncologists and selected the best delineations to serve as the gold-standard segments. The proposed barcode atlas achieved a Jaccard overlap of $87%!\pm!9%$ with the contours of the gold-standard segments. |
Tasks | Medical Image Segmentation, Semantic Segmentation |
Published | 2017-09-28 |
URL | http://arxiv.org/abs/1709.10197v1 |
http://arxiv.org/pdf/1709.10197v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-barcode-retrieval-for-consensus |
Repo | |
Framework | |
Memory-Efficient Deep Salient Object Segmentation Networks on Gridized Superpixels
Title | Memory-Efficient Deep Salient Object Segmentation Networks on Gridized Superpixels |
Authors | Caglar Aytekin, Xingyang Ni, Francesco Cricri, Lixin Fan, Emre Aksu |
Abstract | Computer vision algorithms with pixel-wise labeling tasks, such as semantic segmentation and salient object detection, have gone through a significant accuracy increase with the incorporation of deep learning. Deep segmentation methods slightly modify and fine-tune pre-trained networks that have hundreds of millions of parameters. In this work, we question the need to have such memory demanding networks for the specific task of salient object segmentation. To this end, we propose a way to learn a memory-efficient network from scratch by training it only on salient object detection datasets. Our method encodes images to gridized superpixels that preserve both the object boundaries and the connectivity rules of regular pixels. This representation allows us to use convolutional neural networks that operate on regular grids. By using these encoded images, we train a memory-efficient network using only 0.048% of the number of parameters that other deep salient object detection networks have. Our method shows comparable accuracy with the state-of-the-art deep salient object detection methods and provides a faster and a much more memory-efficient alternative to them. Due to its easy deployment, such a network is preferable for applications in memory limited devices such as mobile phones and IoT devices. |
Tasks | Object Detection, Salient Object Detection, Semantic Segmentation |
Published | 2017-12-27 |
URL | http://arxiv.org/abs/1712.09558v2 |
http://arxiv.org/pdf/1712.09558v2.pdf | |
PWC | https://paperswithcode.com/paper/memory-efficient-deep-salient-object |
Repo | |
Framework | |
Indexing the Event Calculus with Kd-trees to Monitor Diabetes
Title | Indexing the Event Calculus with Kd-trees to Monitor Diabetes |
Authors | Stefano Bromuri, Albert Brugues de la Torre, Fabien Duboisson, Michael Schumacher |
Abstract | Personal Health Systems (PHS) are mobile solutions tailored to monitoring patients affected by chronic non communicable diseases. A patient affected by a chronic disease can generate large amounts of events. Type 1 Diabetic patients generate several glucose events per day, ranging from at least 6 events per day (under normal monitoring) to 288 per day when wearing a continuous glucose monitor (CGM) that samples the blood every 5 minutes for several days. This is a large number of events to monitor for medical doctors, in particular when considering that they may have to take decisions concerning adjusting the treatment, which may impact the life of the patients for a long time. Given the need to analyse such a large stream of data, doctors need a simple approach towards physiological time series that allows them to promptly transfer their knowledge into queries to identify interesting patterns in the data. Achieving this with current technology is not an easy task, as on one hand it cannot be expected that medical doctors have the technical knowledge to query databases and on the other hand these time series include thousands of events, which requires to re-think the way data is indexed. In order to tackle the knowledge representation and efficiency problem, this contribution presents the kd-tree cached event calculus (\ceckd) an event calculus extension for knowledge engineering of temporal rules capable to handle many thousands events produced by a diabetic patient. \ceckd\ is built as a support to a graphical interface to represent monitoring rules for diabetes type 1. In addition, the paper evaluates the \ceckd\ with respect to the cached event calculus (CEC) to show how indexing events using kd-trees improves scalability with respect to the current state of the art. |
Tasks | Time Series |
Published | 2017-10-03 |
URL | http://arxiv.org/abs/1710.01275v1 |
http://arxiv.org/pdf/1710.01275v1.pdf | |
PWC | https://paperswithcode.com/paper/indexing-the-event-calculus-with-kd-trees-to |
Repo | |
Framework | |
Kernelized Hashcode Representations for Relation Extraction
Title | Kernelized Hashcode Representations for Relation Extraction |
Authors | Sahil Garg, Aram Galstyan, Greg Ver Steeg, Irina Rish, Guillermo Cecchi, Shuyang Gao |
Abstract | Kernel methods have produced state-of-the-art results for a number of NLP tasks such as relation extraction, but suffer from poor scalability due to the high cost of computing kernel similarities between natural language structures. A recently proposed technique, kernelized locality-sensitive hashing (KLSH), can significantly reduce the computational cost, but is only applicable to classifiers operating on kNN graphs. Here we propose to use random subspaces of KLSH codes for efficiently constructing an explicit representation of NLP structures suitable for general classification methods. Further, we propose an approach for optimizing the KLSH model for classification problems by maximizing an approximation of mutual information between the KLSH codes (feature vectors) and the class labels. We evaluate the proposed approach on biomedical relation extraction datasets, and observe significant and robust improvements in accuracy w.r.t. state-of-the-art classifiers, along with drastic (orders-of-magnitude) speedup compared to conventional kernel methods. |
Tasks | Relation Extraction |
Published | 2017-11-10 |
URL | https://arxiv.org/abs/1711.04044v7 |
https://arxiv.org/pdf/1711.04044v7.pdf | |
PWC | https://paperswithcode.com/paper/kernelized-hashcode-representations-for |
Repo | |
Framework | |
Adversarial Structured Prediction for Multivariate Measures
Title | Adversarial Structured Prediction for Multivariate Measures |
Authors | Hong Wang, Ashkan Rezaei, Brian D. Ziebart |
Abstract | Many predicted structured objects (e.g., sequences, matchings, trees) are evaluated using the F-score, alignment error rate (AER), or other multivariate performance measures. Since inductively optimizing these measures using training data is typically computationally difficult, empirical risk minimization of surrogate losses is employed, using, e.g., the hinge loss for (structured) support vector machines. These approximations often introduce a mismatch between the learner’s objective and the desired application performance, leading to inconsistency. We take a different approach: adversarially approximate training data while optimizing the exact F-score or AER. Structured predictions under this formulation result from solving zero-sum games between a predictor seeking the best performance and an adversary seeking the worst while required to (approximately) match certain structured properties of the training data. We explore this approach for word alignment (AER evaluation) and named entity recognition (F-score evaluation) with linear-chain constraints. |
Tasks | Named Entity Recognition, Structured Prediction, Word Alignment |
Published | 2017-12-20 |
URL | http://arxiv.org/abs/1712.07374v2 |
http://arxiv.org/pdf/1712.07374v2.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-structured-prediction-for |
Repo | |
Framework | |
A Convex Similarity Index for Sparse Recovery of Missing Image Samples
Title | A Convex Similarity Index for Sparse Recovery of Missing Image Samples |
Authors | Amirhossein Javaheri, Hadi Zayyani, Farokh Marvasti |
Abstract | This paper investigates the problem of recovering missing samples using methods based on sparse representation adapted especially for image signals. Instead of $l_2$-norm or Mean Square Error (MSE), a new perceptual quality measure is used as the similarity criterion between the original and the reconstructed images. The proposed criterion called Convex SIMilarity (CSIM) index is a modified version of the Structural SIMilarity (SSIM) index, which despite its predecessor, is convex and uni-modal. We derive mathematical properties for the proposed index and show how to optimally choose the parameters of the proposed criterion, investigating the Restricted Isometry (RIP) and error-sensitivity properties. We also propose an iterative sparse recovery method based on a constrained $l_1$-norm minimization problem, incorporating CSIM as the fidelity criterion. The resulting convex optimization problem is solved via an algorithm based on Alternating Direction Method of Multipliers (ADMM). Taking advantage of the convexity of the CSIM index, we also prove the convergence of the algorithm to the globally optimal solution of the proposed optimization problem, starting from any arbitrary point. Simulation results confirm the performance of the new similarity index as well as the proposed algorithm for missing sample recovery of image patch signals. |
Tasks | |
Published | 2017-01-25 |
URL | http://arxiv.org/abs/1701.07422v3 |
http://arxiv.org/pdf/1701.07422v3.pdf | |
PWC | https://paperswithcode.com/paper/a-convex-similarity-index-for-sparse-recovery |
Repo | |
Framework | |
Stretching Domain Adaptation: How far is too far?
Title | Stretching Domain Adaptation: How far is too far? |
Authors | Yunhan Zhao, Haider Ali, Rene Vidal |
Abstract | While deep learning has led to significant advances in visual recognition over the past few years, such advances often require a lot of annotated data. Unsupervised domain adaptation has emerged as an alternative approach that does not require as much annotated data, prior evaluations of domain adaptation approaches have been limited to relatively similar datasets, e.g source and target domains are samples captured by different cameras. A new data suite is proposed that comprehensively evaluates cross-modality domain adaptation problems. This work pushes the limit of unsupervised domain adaptation through an in-depth evaluation of several state of the art methods on benchmark datasets and the new dataset suite. We also propose a new domain adaptation network called “Deep MagNet” that effectively transfers knowledge for cross-modality domain adaptation problems. Deep Magnet achieves state of the art performance on two benchmark datasets. More importantly, the proposed method shows consistent improvements in performance on the newly proposed dataset suite. |
Tasks | Domain Adaptation, Unsupervised Domain Adaptation |
Published | 2017-12-06 |
URL | http://arxiv.org/abs/1712.02286v2 |
http://arxiv.org/pdf/1712.02286v2.pdf | |
PWC | https://paperswithcode.com/paper/stretching-domain-adaptation-how-far-is-too |
Repo | |
Framework | |
Learning Markov Chain in Unordered Dataset
Title | Learning Markov Chain in Unordered Dataset |
Authors | Yao-Hung Hubert Tsai, Han Zhao, Ruslan Salakhutdinov, Nebojsa Jojic |
Abstract | The assumption that data samples are independently identically distributed is the backbone of many learning algorithms. Nevertheless, datasets often exhibit rich structure in practice, and we argue that there exist some unknown order within the data instances. In this technical report, we introduce OrderNet that can be used to extract the order of data instances in an unsupervised way. By assuming that the instances are sampled from a Markov chain, our goal is to learn the transitional operator of the underlying Markov chain, as well as the order by maximizing the generation probability under all possible data permutations. Specifically, we use neural network as a compact and soft lookup table to approximate the possibly huge, but discrete transition matrix. This strategy allows us to amortize the space complexity with a single model. Furthermore, this simple and compact representation also provides a short description to the dataset and generalizes to unseen instances as well. To ensure that the learned Markov chain is ergodic, we propose a greedy batch-wise permutation scheme that allows fast training. Empirically, we show that OrderNet is able to discover an order among data instances. We also extend the proposed OrderNet to one-shot recognition task and demonstrate favorable results. |
Tasks | |
Published | 2017-11-08 |
URL | http://arxiv.org/abs/1711.03167v3 |
http://arxiv.org/pdf/1711.03167v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-markov-chain-in-unordered-dataset |
Repo | |
Framework | |
Face Detection, Bounding Box Aggregation and Pose Estimation for Robust Facial Landmark Localisation in the Wild
Title | Face Detection, Bounding Box Aggregation and Pose Estimation for Robust Facial Landmark Localisation in the Wild |
Authors | Zhen-Hua Feng, Josef Kittler, Muhammad Awais, Patrik Huber, Xiao-Jun Wu |
Abstract | We present a framework for robust face detection and landmark localisation of faces in the wild, which has been evaluated as part of `the 2nd Facial Landmark Localisation Competition’. The framework has four stages: face detection, bounding box aggregation, pose estimation and landmark localisation. To achieve a high detection rate, we use two publicly available CNN-based face detectors and two proprietary detectors. We aggregate the detected face bounding boxes of each input image to reduce false positives and improve face detection accuracy. A cascaded shape regressor, trained using faces with a variety of pose variations, is then employed for pose estimation and image pre-processing. Last, we train the final cascaded shape regressor for fine-grained landmark localisation, using a large number of training samples with limited pose variations. The experimental results obtained on the 300W and Menpo benchmarks demonstrate the superiority of our framework over state-of-the-art methods. | |
Tasks | Face Alignment, Face Detection, Pose Estimation |
Published | 2017-05-05 |
URL | http://arxiv.org/abs/1705.02402v2 |
http://arxiv.org/pdf/1705.02402v2.pdf | |
PWC | https://paperswithcode.com/paper/face-detection-bounding-box-aggregation-and |
Repo | |
Framework | |
Deep Face Deblurring
Title | Deep Face Deblurring |
Authors | Grigorios G. Chrysos, Stefanos Zafeiriou |
Abstract | Blind deblurring consists a long studied task, however the outcomes of generic methods are not effective in real world blurred images. Domain-specific methods for deblurring targeted object categories, e.g. text or faces, frequently outperform their generic counterparts, hence they are attracting an increasing amount of attention. In this work, we develop such a domain-specific method to tackle deblurring of human faces, henceforth referred to as face deblurring. Studying faces is of tremendous significance in computer vision, however face deblurring has yet to demonstrate some convincing results. This can be partly attributed to the combination of i) poor texture and ii) highly structure shape that yield the contour/gradient priors (that are typically used) sub-optimal. In our work instead of making assumptions over the prior, we adopt a learning approach by inserting weak supervision that exploits the well-documented structure of the face. Namely, we utilise a deep network to perform the deblurring and employ a face alignment technique to pre-process each face. We additionally surpass the requirement of the deep network for thousands training samples, by introducing an efficient framework that allows the generation of a large dataset. We utilised this framework to create 2MF2, a dataset of over two million frames. We conducted experiments with real world blurred facial images and report that our method returns a result close to the sharp natural latent image. |
Tasks | Deblurring, Face Alignment |
Published | 2017-04-27 |
URL | http://arxiv.org/abs/1704.08772v2 |
http://arxiv.org/pdf/1704.08772v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-face-deblurring |
Repo | |
Framework | |