July 27, 2019

3149 words 15 mins read

Paper Group ANR 495

Paper Group ANR 495

Defect detection for patterned fabric images based on GHOG and low-rank decomposition. Variational Inference via Transformations on Distributions. OmniArt: Multi-task Deep Learning for Artistic Data Analysis. One Size Fits Many: Column Bundle for Multi-X Learning. Can We Teach Computers to Understand Art? Domain Adaptation for Enhancing Deep Networ …

Defect detection for patterned fabric images based on GHOG and low-rank decomposition

Title Defect detection for patterned fabric images based on GHOG and low-rank decomposition
Authors Chunlei Li, Guangshuai Gao, Zhoufeng Liu, Di Huang, Sheng Liu, Miao Yu
Abstract In order to accurately detect defects in patterned fabric images, a novel detection algorithm based on Gabor-HOG (GHOG) and low-rank decomposition is proposed in this paper. Defect-free pattern fabric images have the specified direction, while defects damage their regularity of direction. Therefore, a direction-aware descriptor is designed, denoted as GHOG, a combination of Gabor and HOG, which is extremely valuable for localizing the defect region. Upon devising a powerful directional descriptor, an efficient low-rank decomposition model is constructed to divide the matrix generated by the directional feature extracted from image blocks into a low-rank matrix (background information) and a sparse matrix (defect information). A nonconvex log det(.) as a smooth surrogate function for the rank instead of the nuclear norm is also exploited to improve the efficiency of the low-rank model. Moreover, the computational efficiency is further improved by utilizing the alternative direction method of multipliers (ADMM). Thereafter, the saliency map generated by the sparse matrix is segmented via the optimal threshold algorithm to locate the defect regions. Experimental results show that the proposed method can effectively detect patterned fabric defects and outperform the state-of-the-art methods.
Tasks
Published 2017-02-18
URL http://arxiv.org/abs/1702.05555v1
PDF http://arxiv.org/pdf/1702.05555v1.pdf
PWC https://paperswithcode.com/paper/defect-detection-for-patterned-fabric-images
Repo
Framework

Variational Inference via Transformations on Distributions

Title Variational Inference via Transformations on Distributions
Authors Siddhartha Saxena, Shibhansh Dohare, Jaivardhan Kapoor
Abstract Variational inference methods often focus on the problem of efficient model optimization, with little emphasis on the choice of the approximating posterior. In this paper, we review and implement the various methods that enable us to develop a rich family of approximating posteriors. We show that one particular method employing transformations on distributions results in developing very rich and complex posterior approximation. We analyze its performance on the MNIST dataset by implementing with a Variational Autoencoder and demonstrate its effectiveness in learning better posterior distributions.
Tasks
Published 2017-07-09
URL http://arxiv.org/abs/1707.02510v1
PDF http://arxiv.org/pdf/1707.02510v1.pdf
PWC https://paperswithcode.com/paper/variational-inference-via-transformations-on
Repo
Framework

OmniArt: Multi-task Deep Learning for Artistic Data Analysis

Title OmniArt: Multi-task Deep Learning for Artistic Data Analysis
Authors Gjorgji Strezoski, Marcel Worring
Abstract Vast amounts of artistic data is scattered on-line from both museums and art applications. Collecting, processing and studying it with respect to all accompanying attributes is an expensive process. With a motivation to speed up and improve the quality of categorical analysis in the artistic domain, in this paper we propose an efficient and accurate method for multi-task learning with a shared representation applied in the artistic domain. We continue to show how different multi-task configurations of our method behave on artistic data and outperform handcrafted feature approaches as well as convolutional neural networks. In addition to the method and analysis, we propose a challenge like nature to the new aggregated data set with almost half a million samples and structured meta-data to encourage further research and societal engagement.
Tasks Multi-Task Learning
Published 2017-08-02
URL http://arxiv.org/abs/1708.00684v1
PDF http://arxiv.org/pdf/1708.00684v1.pdf
PWC https://paperswithcode.com/paper/omniart-multi-task-deep-learning-for-artistic
Repo
Framework

One Size Fits Many: Column Bundle for Multi-X Learning

Title One Size Fits Many: Column Bundle for Multi-X Learning
Authors Trang Pham, Truyen Tran, Svetha Venkatesh
Abstract Much recent machine learning research has been directed towards leveraging shared statistics among labels, instances and data views, commonly referred to as multi-label, multi-instance and multi-view learning. The underlying premises are that there exist correlations among input parts and among output targets, and the predictive performance would increase when the correlations are incorporated. In this paper, we propose Column Bundle (CLB), a novel deep neural network for capturing the shared statistics in data. CLB is generic that the same architecture can be applied for various types of shared statistics by changing only input and output handling. CLB is capable of scaling to thousands of input parts and output labels by avoiding explicit modeling of pairwise relations. We evaluate CLB on different types of data: (a) multi-label, (b) multi-view, (c) multi-view/multi-label and (d) multi-instance. CLB demonstrates a comparable and competitive performance in all datasets against state-of-the-art methods designed specifically for each type.
Tasks MULTI-VIEW LEARNING
Published 2017-02-22
URL http://arxiv.org/abs/1702.07021v2
PDF http://arxiv.org/pdf/1702.07021v2.pdf
PWC https://paperswithcode.com/paper/one-size-fits-many-column-bundle-for-multi-x
Repo
Framework

Can We Teach Computers to Understand Art? Domain Adaptation for Enhancing Deep Networks Capacity to De-Abstract Art

Title Can We Teach Computers to Understand Art? Domain Adaptation for Enhancing Deep Networks Capacity to De-Abstract Art
Authors Mihai Badea, Corneliu Florea, Laura Florea, Constantin Vertan
Abstract Humans comprehend a natural scene at a single glance; painters and other visual artists, through their abstract representations, stressed this capacity to the limit. The performance of computer vision solutions matched that of humans in many problems of visual recognition. In this paper we address the problem of recognizing the genre (subject) in digitized paintings using Convolutional Neural Networks (CNN) as part of the more general dealing with abstract and/or artistic representation of scenes. Initially we establish the state of the art performance by training a CNN from scratch. In the next level of evaluation, we identify aspects that hinder the CNNs’ recognition, such as artistic abstraction. Further, we test various domain adaptation methods that could enhance the subject recognition capabilities of the CNNs. The evaluation is performed on a database of 80,000 annotated digitized paintings, which is tentatively extended with artistic photographs, either original or stylized, in order to emulate artistic representations. Surprisingly, the most efficient domain adaptation is not the neural style transfer. Finally, the paper provides an experiment-based assessment of the abstraction level that CNNs are able to achieve.
Tasks Domain Adaptation, Style Transfer
Published 2017-12-11
URL http://arxiv.org/abs/1712.03727v1
PDF http://arxiv.org/pdf/1712.03727v1.pdf
PWC https://paperswithcode.com/paper/can-we-teach-computers-to-understand-art
Repo
Framework

Learning Model Predictive Control for Iterative Tasks: A Computationally Efficient Approach for Linear System

Title Learning Model Predictive Control for Iterative Tasks: A Computationally Efficient Approach for Linear System
Authors Ugo Rosolia, Francesco Borrelli
Abstract A Learning Model Predictive Controller (LMPC) for linear system in presented. The proposed controller is an extension of the LMPC [1] and it aims to decrease the computational burden. The control scheme is reference-free and is able to improve its performance by learning from previous iterations. A convex safe set and a terminal cost function are used in order to guarantee recursive feasibility and non-increasing performance at each iteration. The paper presents the control design approach, and shows how to recursively construct the convex terminal set and the terminal cost from state and input trajectories of previous iterations. Simulation results show the effectiveness of the proposed control logic.
Tasks
Published 2017-02-23
URL https://arxiv.org/abs/1702.07064v4
PDF https://arxiv.org/pdf/1702.07064v4.pdf
PWC https://paperswithcode.com/paper/learning-model-predictive-control-for
Repo
Framework

Granger Mediation Analysis of Multiple Time Series with an Application to fMRI

Title Granger Mediation Analysis of Multiple Time Series with an Application to fMRI
Authors Yi Zhao, Xi Luo
Abstract It becomes increasingly popular to perform mediation analysis for complex data from sophisticated experimental studies. In this paper, we present Granger Mediation Analysis (GMA), a new framework for causal mediation analysis of multiple time series. This framework is motivated by a functional magnetic resonance imaging (fMRI) experiment where we are interested in estimating the mediation effects between a randomized stimulus time series and brain activity time series from two brain regions. The stable unit treatment assumption for causal mediation analysis is thus unrealistic for this type of time series data. To address this challenge, our framework integrates two types of models: causal mediation analysis across the variables and vector autoregressive models across the temporal observations. We further extend this framework to handle multilevel data to address individual variability and correlated errors between the mediator and the outcome variables. These models not only provide valid causal mediation for time series data but also model the causal dynamics across time. We show that the modeling parameters in our models are identifiable, and we develop computationally efficient methods to maximize the likelihood-based optimization criteria. Simulation studies show that our method reduces the estimation bias and improve statistical power, compared to existing approaches. On a real fMRI data set, our approach not only infers the causal effects of brain pathways but accurately captures the feedback effect of the outcome region on the mediator region.
Tasks Time Series
Published 2017-09-15
URL http://arxiv.org/abs/1709.05328v1
PDF http://arxiv.org/pdf/1709.05328v1.pdf
PWC https://paperswithcode.com/paper/granger-mediation-analysis-of-multiple-time
Repo
Framework

Dual Language Models for Code Switched Speech Recognition

Title Dual Language Models for Code Switched Speech Recognition
Authors Saurabh Garg, Tanmay Parekh, Preethi Jyothi
Abstract In this work, we present a simple and elegant approach to language modeling for bilingual code-switched text. Since code-switching is a blend of two or more different languages, a standard bilingual language model can be improved upon by using structures of the monolingual language models. We propose a novel technique called dual language models, which involves building two complementary monolingual language models and combining them using a probabilistic model for switching between the two. We evaluate the efficacy of our approach using a conversational Mandarin-English speech corpus. We prove the robustness of our model by showing significant improvements in perplexity measures over the standard bilingual language model without the use of any external information. Similar consistent improvements are also reflected in automatic speech recognition error rates.
Tasks Language Modelling, Speech Recognition
Published 2017-11-03
URL http://arxiv.org/abs/1711.01048v2
PDF http://arxiv.org/pdf/1711.01048v2.pdf
PWC https://paperswithcode.com/paper/dual-language-models-for-code-switched-speech
Repo
Framework

Image Matters: Visually modeling user behaviors using Advanced Model Server

Title Image Matters: Visually modeling user behaviors using Advanced Model Server
Authors Tiezheng Ge, Liqin Zhao, Guorui Zhou, Keyu Chen, Shuying Liu, Huimin Yi, Zelin Hu, Bochao Liu, Peng Sun, Haoyu Liu, Pengtao Yi, Sui Huang, Zhiqiang Zhang, Xiaoqiang Zhu, Yu Zhang, Kun Gai
Abstract In Taobao, the largest e-commerce platform in China, billions of items are provided and typically displayed with their images. For better user experience and business effectiveness, Click Through Rate (CTR) prediction in online advertising system exploits abundant user historical behaviors to identify whether a user is interested in a candidate ad. Enhancing behavior representations with user behavior images will help understand user’s visual preference and improve the accuracy of CTR prediction greatly. So we propose to model user preference jointly with user behavior ID features and behavior images. However, training with user behavior images brings tens to hundreds of images in one sample, giving rise to a great challenge in both communication and computation. To handle these challenges, we propose a novel and efficient distributed machine learning paradigm called Advanced Model Server (AMS). With the well known Parameter Server (PS) framework, each server node handles a separate part of parameters and updates them independently. AMS goes beyond this and is designed to be capable of learning a unified image descriptor model shared by all server nodes which embeds large images into low dimensional high level features before transmitting images to worker nodes. AMS thus dramatically reduces the communication load and enables the arduous joint training process. Based on AMS, the methods of effectively combining the images and ID features are carefully studied, and then we propose a Deep Image CTR Model. Our approach is shown to achieve significant improvements in both online and offline evaluations, and has been deployed in Taobao display advertising system serving the main traffic.
Tasks Click-Through Rate Prediction
Published 2017-11-17
URL http://arxiv.org/abs/1711.06505v3
PDF http://arxiv.org/pdf/1711.06505v3.pdf
PWC https://paperswithcode.com/paper/image-matters-visually-modeling-user
Repo
Framework

Continuously Differentiable Exponential Linear Units

Title Continuously Differentiable Exponential Linear Units
Authors Jonathan T. Barron
Abstract Exponential Linear Units (ELUs) are a useful rectifier for constructing deep learning architectures, as they may speed up and otherwise improve learning by virtue of not have vanishing gradients and by having mean activations near zero. However, the ELU activation as parametrized in [1] is not continuously differentiable with respect to its input when the shape parameter alpha is not equal to 1. We present an alternative parametrization which is C1 continuous for all values of alpha, making the rectifier easier to reason about and making alpha easier to tune. This alternative parametrization has several other useful properties that the original parametrization of ELU does not: 1) its derivative with respect to x is bounded, 2) it contains both the linear transfer function and ReLU as special cases, and 3) it is scale-similar with respect to alpha.
Tasks
Published 2017-04-24
URL http://arxiv.org/abs/1704.07483v1
PDF http://arxiv.org/pdf/1704.07483v1.pdf
PWC https://paperswithcode.com/paper/continuously-differentiable-exponential
Repo
Framework

Classification of postoperative surgical site infections from blood measurements with missing data using recurrent neural networks

Title Classification of postoperative surgical site infections from blood measurements with missing data using recurrent neural networks
Authors Andreas Storvik Strauman, Filippo Maria Bianchi, Karl Øyvind Mikalsen, Michael Kampffmeyer, Cristina Soguero-Ruiz, Robert Jenssen
Abstract Clinical measurements that can be represented as time series constitute an important fraction of the electronic health records and are often both uncertain and incomplete. Recurrent neural networks are a special class of neural networks that are particularly suitable to process time series data but, in their original formulation, cannot explicitly deal with missing data. In this paper, we explore imputation strategies for handling missing values in classifiers based on recurrent neural network (RNN) and apply a recently proposed recurrent architecture, the Gated Recurrent Unit with Decay, specifically designed to handle missing data. We focus on the problem of detecting surgical site infection in patients by analyzing time series of their blood sample measurements and we compare the results obtained with different RNN-based classifiers.
Tasks Imputation, Time Series
Published 2017-11-17
URL http://arxiv.org/abs/1711.06516v1
PDF http://arxiv.org/pdf/1711.06516v1.pdf
PWC https://paperswithcode.com/paper/classification-of-postoperative-surgical-site
Repo
Framework

Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs

Title Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs
Authors Xiaoming Chen, Jianxu Chen, Danny Z. Chen, Xiaobo Sharon Hu
Abstract Convolution is a fundamental operation in many applications, such as computer vision, natural language processing, image processing, etc. Recent successes of convolutional neural networks in various deep learning applications put even higher demand on fast convolution. The high computation throughput and memory bandwidth of graphics processing units (GPUs) make GPUs a natural choice for accelerating convolution operations. However, maximally exploiting the available memory bandwidth of GPUs for convolution is a challenging task. This paper introduces a general model to address the mismatch between the memory bank width of GPUs and computation data width of threads. Based on this model, we develop two convolution kernels, one for the general case and the other for a special case with one input channel. By carefully optimizing memory access patterns and computation patterns, we design a communication-optimized kernel for the special case and a communication-reduced kernel for the general case. Experimental data based on implementations on Kepler GPUs show that our kernels achieve 5.16X and 35.5% average performance improvement over the latest cuDNN library, for the special case and the general case, respectively.
Tasks
Published 2017-05-29
URL http://arxiv.org/abs/1705.10591v1
PDF http://arxiv.org/pdf/1705.10591v1.pdf
PWC https://paperswithcode.com/paper/optimizing-memory-efficiency-for-convolution
Repo
Framework

Complete End-To-End Low Cost Solution To a 3D Scanning System with Integrated Turntable

Title Complete End-To-End Low Cost Solution To a 3D Scanning System with Integrated Turntable
Authors Saed Khawaldeh, Tajwar Abrar Aleef, Usama Pervaiz, Vu Hoang Minh, Yeman Brhane Hagos
Abstract 3D reconstruction is a technique used in computer vision which has a wide range of applications in areas like object recognition, city modelling, virtual reality, physical simulations, video games and special effects. Previously, to perform a 3D reconstruction, specialized hardwares were required. Such systems were often very expensive and was only available for industrial or research purpose. With the rise of the availability of high-quality low cost 3D sensors, it is now possible to design inexpensive complete 3D scanning systems. The objective of this work was to design an acquisition and processing system that can perform 3D scanning and reconstruction of objects seamlessly. In addition, the goal of this work also included making the 3D scanning process fully automated by building and integrating a turntable alongside the software. This means the user can perform a full 3D scan only by a press of a few buttons from our dedicated graphical user interface. Three main steps were followed to go from acquisition of point clouds to the finished reconstructed 3D model. First, our system acquires point cloud data of a person/object using inexpensive camera sensor. Second, align and convert the acquired point cloud data into a watertight mesh of good quality. Third, export the reconstructed model to a 3D printer to obtain a proper 3D print of the model.
Tasks 3D Reconstruction, Object Recognition, Physical Simulations
Published 2017-09-03
URL http://arxiv.org/abs/1709.02247v1
PDF http://arxiv.org/pdf/1709.02247v1.pdf
PWC https://paperswithcode.com/paper/complete-end-to-end-low-cost-solution-to-a-3d
Repo
Framework

Paradoxes in Fair Computer-Aided Decision Making

Title Paradoxes in Fair Computer-Aided Decision Making
Authors Andrew Morgan, Rafael Pass
Abstract Computer-aided decision making–where a human decision-maker is aided by a computational classifier in making a decision–is becoming increasingly prevalent. For instance, judges in at least nine states make use of algorithmic tools meant to determine “recidivism risk scores” for criminal defendants in sentencing, parole, or bail decisions. A subject of much recent debate is whether such algorithmic tools are “fair” in the sense that they do not discriminate against certain groups (e.g., races) of people. Our main result shows that for “non-trivial” computer-aided decision making, either the classifier must be discriminatory, or a rational decision-maker using the output of the classifier is forced to be discriminatory. We further provide a complete characterization of situations where fair computer-aided decision making is possible.
Tasks Decision Making
Published 2017-11-29
URL http://arxiv.org/abs/1711.11066v2
PDF http://arxiv.org/pdf/1711.11066v2.pdf
PWC https://paperswithcode.com/paper/paradoxes-in-fair-computer-aided-decision
Repo
Framework

Attention-based Wav2Text with Feature Transfer Learning

Title Attention-based Wav2Text with Feature Transfer Learning
Authors Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Abstract Conventional automatic speech recognition (ASR) typically performs multi-level pattern recognition tasks that map the acoustic speech waveform into a hierarchy of speech units. But, it is widely known that information loss in the earlier stage can propagate through the later stages. After the resurgence of deep learning, interest has emerged in the possibility of developing a purely end-to-end ASR system from the raw waveform to the transcription without any predefined alignments and hand-engineered models. However, the successful attempts in end-to-end architecture still used spectral-based features, while the successful attempts in using raw waveform were still based on the hybrid deep neural network - Hidden Markov model (DNN-HMM) framework. In this paper, we construct the first end-to-end attention-based encoder-decoder model to process directly from raw speech waveform to the text transcription. We called the model as “Attention-based Wav2Text”. To assist the training process of the end-to-end model, we propose to utilize a feature transfer learning. Experimental results also reveal that the proposed Attention-based Wav2Text model directly with raw waveform could achieve a better result in comparison with the attentional encoder-decoder model trained on standard front-end filterbank features.
Tasks End-To-End Speech Recognition, Speech Recognition, Transfer Learning
Published 2017-09-22
URL http://arxiv.org/abs/1709.07814v1
PDF http://arxiv.org/pdf/1709.07814v1.pdf
PWC https://paperswithcode.com/paper/attention-based-wav2text-with-feature
Repo
Framework
comments powered by Disqus