October 17, 2019

3199 words 16 mins read

Paper Group ANR 857

Paper Group ANR 857

Real-Time 3D Shape of Micro-Details. Online Deep Metric Learning. On the Convergence of Learning-based Iterative Methods for Nonconvex Inverse Problems. Synthetic data generation for end-to-end thermal infrared tracking. Simple Large-scale Relation Extraction from Unstructured Text. Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calib …

Real-Time 3D Shape of Micro-Details

Title Real-Time 3D Shape of Micro-Details
Authors Maryam Khanian, Ali Sharifi Boroujerdi, Michael Breuss
Abstract Motivated by the growing demand for interactive environments, we propose an accurate real-time 3D shape reconstruction technique. To provide a reliable 3D reconstruction which is still a challenging task when dealing with real-world applications, we integrate several components including (i) Photometric Stereo (PS), (ii) perspective Cook-Torrance reflectance model that enables PS to deal with a broad range of possible real-world object reflections, (iii) realistic lightening situation, (iv) a Recurrent Optimization Network (RON) and finally (v) heuristic Dijkstra Gaussian Mean Curvature (DGMC) initialization approach. We demonstrate the potential benefits of our hybrid model by providing 3D shape with highly-detailed information from micro-prints for the first time. All real-world images are taken by a mobile phone camera under a simple setup as a consumer-level equipment. In addition, complementary synthetic experiments confirm the beneficial properties of our novel method and its superiority over the state-of-the-art approaches.
Tasks 3D Reconstruction
Published 2018-02-16
URL http://arxiv.org/abs/1802.06140v1
PDF http://arxiv.org/pdf/1802.06140v1.pdf
PWC https://paperswithcode.com/paper/real-time-3d-shape-of-micro-details
Repo
Framework

Online Deep Metric Learning

Title Online Deep Metric Learning
Authors Wenbin Li, Jing Huo, Yinghuan Shi, Yang Gao, Lei Wang, Jiebo Luo
Abstract Metric learning learns a metric function from training data to calculate the similarity or distance between samples. From the perspective of feature learning, metric learning essentially learns a new feature space by feature transformation (e.g., Mahalanobis distance metric). However, traditional metric learning algorithms are shallow, which just learn one metric space (feature transformation). Can we further learn a better metric space from the learnt metric space? In other words, can we learn metric progressively and nonlinearly like deep learning by just using the existing metric learning algorithms? To this end, we present a hierarchical metric learning scheme and implement an online deep metric learning framework, namely ODML. Specifically, we take one online metric learning algorithm as a metric layer, followed by a nonlinear layer (i.e., ReLU), and then stack these layers modelled after the deep learning. The proposed ODML enjoys some nice properties, indeed can learn metric progressively and performs superiorly on some datasets. Various experiments with different settings have been conducted to verify these properties of the proposed ODML.
Tasks Metric Learning
Published 2018-05-15
URL http://arxiv.org/abs/1805.05510v1
PDF http://arxiv.org/pdf/1805.05510v1.pdf
PWC https://paperswithcode.com/paper/online-deep-metric-learning
Repo
Framework

On the Convergence of Learning-based Iterative Methods for Nonconvex Inverse Problems

Title On the Convergence of Learning-based Iterative Methods for Nonconvex Inverse Problems
Authors Risheng Liu, Shichao Cheng, Yi He, Xin Fan, Zhouchen Lin, Zhongxuan Luo
Abstract Numerous tasks at the core of statistics, learning and vision areas are specific cases of ill-posed inverse problems. Recently, learning-based (e.g., deep) iterative methods have been empirically shown to be useful for these problems. Nevertheless, integrating learnable structures into iterations is still a laborious process, which can only be guided by intuitions or empirical insights. Moreover, there is a lack of rigorous analysis about the convergence behaviors of these reimplemented iterations, and thus the significance of such methods is a little bit vague. This paper moves beyond these limits and proposes Flexible Iterative Modularization Algorithm (FIMA), a generic and provable paradigm for nonconvex inverse problems. Our theoretical analysis reveals that FIMA allows us to generate globally convergent trajectories for learning-based iterative methods. Meanwhile, the devised scheduling policies on flexible modules should also be beneficial for classical numerical methods in the nonconvex scenario. Extensive experiments on real applications verify the superiority of FIMA.
Tasks
Published 2018-08-16
URL http://arxiv.org/abs/1808.05331v1
PDF http://arxiv.org/pdf/1808.05331v1.pdf
PWC https://paperswithcode.com/paper/on-the-convergence-of-learning-based
Repo
Framework

Synthetic data generation for end-to-end thermal infrared tracking

Title Synthetic data generation for end-to-end thermal infrared tracking
Authors Lichao Zhang, Abel Gonzalez-Garcia, Joost van de Weijer, Martin Danelljan, Fahad Shahbaz Khan
Abstract The usage of both off-the-shelf and end-to-end trained deep networks have significantly improved performance of visual tracking on RGB videos. However, the lack of large labeled datasets hampers the usage of convolutional neural networks for tracking in thermal infrared (TIR) images. Therefore, most state of the art methods on tracking for TIR data are still based on handcrafted features. To address this problem, we propose to use image-to-image translation models. These models allow us to translate the abundantly available labeled RGB data to synthetic TIR data. We explore both the usage of paired and unpaired image translation models for this purpose. These methods provide us with a large labeled dataset of synthetic TIR sequences, on which we can train end-to-end optimal features for tracking. To the best of our knowledge we are the first to train end-to-end features for TIR tracking. We perform extensive experiments on VOT-TIR2017 dataset. We show that a network trained on a large dataset of synthetic TIR data obtains better performance than one trained on the available real TIR data. Combining both data sources leads to further improvement. In addition, when we combine the network with motion features we outperform the state of the art with a relative gain of over 10%, clearly showing the efficiency of using synthetic data to train end-to-end TIR trackers.
Tasks Image-to-Image Translation, Synthetic Data Generation, Visual Tracking
Published 2018-06-04
URL http://arxiv.org/abs/1806.01013v2
PDF http://arxiv.org/pdf/1806.01013v2.pdf
PWC https://paperswithcode.com/paper/synthetic-data-generation-for-end-to-end
Repo
Framework

Simple Large-scale Relation Extraction from Unstructured Text

Title Simple Large-scale Relation Extraction from Unstructured Text
Authors Christos Christodoulopoulos, Arpit Mittal
Abstract Knowledge-based question answering relies on the availability of facts, the majority of which cannot be found in structured sources (e.g. Wikipedia info-boxes, Wikidata). One of the major components of extracting facts from unstructured text is Relation Extraction (RE). In this paper we propose a novel method for creating distant (weak) supervision labels for training a large-scale RE system. We also provide new evidence about the effectiveness of neural network approaches by decoupling the model architecture from the feature design of a state-of-the-art neural network system. Surprisingly, a much simpler classifier trained on similar features performs on par with the highly complex neural network system (at 75x reduction to the training time), suggesting that the features are a bigger contributor to the final performance.
Tasks Question Answering, Relation Extraction
Published 2018-03-24
URL http://arxiv.org/abs/1803.09091v1
PDF http://arxiv.org/pdf/1803.09091v1.pdf
PWC https://paperswithcode.com/paper/simple-large-scale-relation-extraction-from
Repo
Framework

Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters

Title Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters
Authors Aniruddh Raghu, Omer Gottesman, Yao Liu, Matthieu Komorowski, Aldo Faisal, Finale Doshi-Velez, Emma Brunskill
Abstract In this work, we consider the problem of estimating a behaviour policy for use in Off-Policy Policy Evaluation (OPE) when the true behaviour policy is unknown. Via a series of empirical studies, we demonstrate how accurate OPE is strongly dependent on the calibration of estimated behaviour policy models: how precisely the behaviour policy is estimated from data. We show how powerful parametric models such as neural networks can result in highly uncalibrated behaviour policy models on a real-world medical dataset, and illustrate how a simple, non-parametric, k-nearest neighbours model produces better calibrated behaviour policy estimates and can be used to obtain superior importance sampling-based OPE estimates.
Tasks Calibration
Published 2018-07-03
URL http://arxiv.org/abs/1807.01066v2
PDF http://arxiv.org/pdf/1807.01066v2.pdf
PWC https://paperswithcode.com/paper/behaviour-policy-estimation-in-off-policy
Repo
Framework

An Information-Theoretic View for Deep Learning

Title An Information-Theoretic View for Deep Learning
Authors Jingwei Zhang, Tongliang Liu, Dacheng Tao
Abstract Deep learning has transformed computer vision, natural language processing, and speech recognition\cite{badrinarayanan2017segnet, dong2016image, ren2017faster, ji20133d}. However, two critical questions remain obscure: (1) why do deep neural networks generalize better than shallow networks; and (2) does it always hold that a deeper network leads to better performance? Specifically, letting $L$ be the number of convolutional and pooling layers in a deep neural network, and $n$ be the size of the training sample, we derive an upper bound on the expected generalization error for this network, i.e., \begin{eqnarray*} \mathbb{E}[R(W)-R_S(W)] \leq \exp{\left(-\frac{L}{2}\log{\frac{1}{\eta}}\right)}\sqrt{\frac{2\sigma^2}{n}I(S,W) } \end{eqnarray*} where $\sigma >0$ is a constant depending on the loss function, $0<\eta<1$ is a constant depending on the information loss for each convolutional or pooling layer, and $I(S, W)$ is the mutual information between the training sample $S$ and the output hypothesis $W$. This upper bound shows that as the number of convolutional and pooling layers $L$ increases in the network, the expected generalization error will decrease exponentially to zero. Layers with strict information loss, such as the convolutional layers, reduce the generalization error for the whole network; this answers the first question. However, algorithms with zero expected generalization error does not imply a small test error or $\mathbb{E}[R(W)]$. This is because $\mathbb{E}[R_S(W)]$ is large when the information for fitting the data is lost as the number of layers increases. This suggests that the claim `the deeper the better’ is conditioned on a small training error or $\mathbb{E}[R_S(W)]$. Finally, we show that deep learning satisfies a weak notion of stability and the sample complexity of deep neural networks will decrease as $L$ increases. |
Tasks Speech Recognition
Published 2018-04-24
URL http://arxiv.org/abs/1804.09060v8
PDF http://arxiv.org/pdf/1804.09060v8.pdf
PWC https://paperswithcode.com/paper/an-information-theoretic-view-for-deep
Repo
Framework

Crossbar-Net: A Novel Convolutional Network for Kidney Tumor Segmentation in CT Images

Title Crossbar-Net: A Novel Convolutional Network for Kidney Tumor Segmentation in CT Images
Authors Qian Yu, Yinghuan Shi, Jinquan Sun, Yang Gao, Yakang Dai, Jianbing Zhu
Abstract Due to the irregular motion, similar appearance and diverse shape, accurate segmentation of kidney tumor in CT images is a difficult and challenging task. To this end, we present a novel automatic segmentation method, termed as Crossbar-Net, with the goal of accurate segmenting the kidney tumors. Firstly, considering that the traditional learning-based segmentation methods normally employ either whole images or squared patches as the training samples, we innovatively sample the orthogonal non-squared patches (namely crossbar patches), to fully cover the whole kidney tumors in either horizontal or vertical directions. These sampled crossbar patches could not only represent the detailed local information of kidney tumor as the traditional patches, but also describe the global appearance from either horizontal or vertical direction using contextual information. Secondly, with the obtained crossbar patches, we trained a convolutional neural network with two sub-models (i.e., horizontal sub-model and vertical sub-model) in a cascaded manner, to integrate the segmentation results from two directions (i.e., horizontal and vertical). This cascaded training strategy could effectively guarantee the consistency between sub-models, by feeding each other with the most difficult samples, for a better segmentation. In the experiment, we evaluate our method on a real CT kidney tumor dataset, collected from 94 different patients including 3,500 images. Compared with the state-of-the-art segmentation methods, the results demonstrate the superior results of our method on dice ratio score, true positive fraction, centroid distance and Hausdorff distance. Moreover, we have extended our crossbar-net to a different task: cardiac segmentation, showing the promising results for the better generalization.
Tasks Cardiac Segmentation
Published 2018-04-27
URL http://arxiv.org/abs/1804.10484v3
PDF http://arxiv.org/pdf/1804.10484v3.pdf
PWC https://paperswithcode.com/paper/crossbar-net-a-novel-convolutional-network
Repo
Framework

Segmental Audio Word2Vec: Representing Utterances as Sequences of Vectors with Applications in Spoken Term Detection

Title Segmental Audio Word2Vec: Representing Utterances as Sequences of Vectors with Applications in Spoken Term Detection
Authors Yu-Hsuan Wang, Hung-yi Lee, Lin-shan Lee
Abstract While Word2Vec represents words (in text) as vectors carrying semantic information, audio Word2Vec was shown to be able to represent signal segments of spoken words as vectors carrying phonetic structure information. Audio Word2Vec can be trained in an unsupervised way from an unlabeled corpus, except the word boundaries are needed. In this paper, we extend audio Word2Vec from word-level to utterance-level by proposing a new segmental audio Word2Vec, in which unsupervised spoken word boundary segmentation and audio Word2Vec are jointly learned and mutually enhanced, so an utterance can be directly represented as a sequence of vectors carrying phonetic structure information. This is achieved by a segmental sequence-to-sequence autoencoder (SSAE), in which a segmentation gate trained with reinforcement learning is inserted in the encoder. Experiments on English, Czech, French and German show very good performance in both unsupervised spoken word segmentation and spoken term detection applications (significantly better than frame-based DTW).
Tasks
Published 2018-08-07
URL http://arxiv.org/abs/1808.02228v1
PDF http://arxiv.org/pdf/1808.02228v1.pdf
PWC https://paperswithcode.com/paper/segmental-audio-word2vec-representing
Repo
Framework

A Unified Dynamic Approach to Sparse Model Selection

Title A Unified Dynamic Approach to Sparse Model Selection
Authors Chendi Huang, Yuan Yao
Abstract Sparse model selection is ubiquitous from linear regression to graphical models where regularization paths, as a family of estimators upon the regularization parameter varying, are computed when the regularization parameter is unknown or decided data-adaptively. Traditional computational methods rely on solving a set of optimization problems where the regularization parameters are fixed on a grid that might be inefficient. In this paper, we introduce a simple iterative regularization path, which follows the dynamics of a sparse Mirror Descent algorithm or a generalization of Linearized Bregman Iterations with nonlinear loss. Its performance is competitive to \texttt{glmnet} with a further bias reduction. A path consistency theory is presented that under the Restricted Strong Convexity (RSC) and the Irrepresentable Condition (IRR), the path will first evolve in a subspace with no false positives and reach an estimator that is sign-consistent or of minimax optimal $\ell_2$ error rate. Early stopping regularization is required to prevent overfitting. Application examples are given in sparse logistic regression and Ising models for NIPS coauthorship.
Tasks Model Selection
Published 2018-10-08
URL http://arxiv.org/abs/1810.03608v1
PDF http://arxiv.org/pdf/1810.03608v1.pdf
PWC https://paperswithcode.com/paper/a-unified-dynamic-approach-to-sparse-model
Repo
Framework

TechKG: A Large-Scale Chinese Technology-Oriented Knowledge Graph

Title TechKG: A Large-Scale Chinese Technology-Oriented Knowledge Graph
Authors Feiliang Ren, Yining Hou, Yan Li, Linfeng Pan, Yi Zhang, Xiaobo Liang, Yongkang Liu, Yu Guo, Rongsheng Zhao, Ruicheng Ming, Huiming Wu
Abstract Knowledge graph is a kind of valuable knowledge base which would benefit lots of AI-related applications. Up to now, lots of large-scale knowledge graphs have been built. However, most of them are non-Chinese and designed for general purpose. In this work, we introduce TechKG, a large scale Chinese knowledge graph that is technology-oriented. It is built automatically from massive technical papers that are published in Chinese academic journals of different research domains. Some carefully designed heuristic rules are used to extract high quality entities and relations. Totally, it comprises of over 260 million triplets that are built upon more than 52 million entities which come from 38 research domains. Our preliminary ex-periments indicate that TechKG has high adaptability and can be used as a dataset for many diverse AI-related applications. We released TechKG at: http://www.techkg.cn.
Tasks Knowledge Graphs
Published 2018-12-17
URL http://arxiv.org/abs/1812.06722v1
PDF http://arxiv.org/pdf/1812.06722v1.pdf
PWC https://paperswithcode.com/paper/techkg-a-large-scale-chinese-technology
Repo
Framework

Inter-BMV: Interpolation with Block Motion Vectors for Fast Semantic Segmentation on Video

Title Inter-BMV: Interpolation with Block Motion Vectors for Fast Semantic Segmentation on Video
Authors Samvit Jain, Joseph E. Gonzalez
Abstract Models optimized for accuracy on single images are often prohibitively slow to run on each frame in a video. Recent work exploits the use of optical flow to warp image features forward from select keyframes, as a means to conserve computation on video. This approach, however, achieves only limited speedup, even when optimized, due to the accuracy degradation introduced by repeated forward warping, and the inference cost of optical flow estimation. To address these problems, we propose a new scheme that propagates features using the block motion vectors (BMV) present in compressed video (e.g. H.264 codecs), instead of optical flow, and bi-directionally warps and fuses features from enclosing keyframes to capture scene context on each video frame. Our technique, interpolation-BMV, enables us to accurately estimate the features of intermediate frames, while keeping inference costs low. We evaluate our system on the CamVid and Cityscapes datasets, comparing to both a strong single-frame baseline and related work. We find that we are able to substantially accelerate segmentation on video, achieving near real-time frame rates (20+ frames per second) on large images (e.g. 960 x 720 pixels), while maintaining competitive accuracy. This represents an improvement of almost 6x over the single-frame baseline and 2.5x over the fastest prior work.
Tasks Optical Flow Estimation, Semantic Segmentation
Published 2018-10-08
URL http://arxiv.org/abs/1810.04047v1
PDF http://arxiv.org/pdf/1810.04047v1.pdf
PWC https://paperswithcode.com/paper/inter-bmv-interpolation-with-block-motion
Repo
Framework

Reinforcement Learning with Function-Valued Action Spaces for Partial Differential Equation Control

Title Reinforcement Learning with Function-Valued Action Spaces for Partial Differential Equation Control
Authors Yangchen Pan, Amir-massoud Farahmand, Martha White, Saleh Nabi, Piyush Grover, Daniel Nikovski
Abstract Recent work has shown that reinforcement learning (RL) is a promising approach to control dynamical systems described by partial differential equations (PDE). This paper shows how to use RL to tackle more general PDE control problems that have continuous high-dimensional action spaces with spatial relationship among action dimensions. In particular, we propose the concept of action descriptors, which encode regularities among spatially-extended action dimensions and enable the agent to control high-dimensional action PDEs. We provide theoretical evidence suggesting that this approach can be more sample efficient compared to a conventional approach that treats each action dimension separately and does not explicitly exploit the spatial regularity of the action space. The action descriptor approach is then used within the deep deterministic policy gradient algorithm. Experiments on two PDE control problems, with up to 256-dimensional continuous actions, show the advantage of the proposed approach over the conventional one.
Tasks
Published 2018-06-13
URL http://arxiv.org/abs/1806.06931v1
PDF http://arxiv.org/pdf/1806.06931v1.pdf
PWC https://paperswithcode.com/paper/reinforcement-learning-with-function-valued
Repo
Framework

Image Cartoon-Texture Decomposition Using Isotropic Patch Recurrence

Title Image Cartoon-Texture Decomposition Using Isotropic Patch Recurrence
Authors Ruotao Xu, Yuhui Quan, Yong Xu
Abstract Aiming at separating the cartoon and texture layers from an image, cartoon-texture decomposition approaches resort to image priors to model cartoon and texture respectively. In recent years, patch recurrence has emerged as a powerful prior for image recovery. However, the existing strategies of using patch recurrence are ineffective to cartoon-texture decomposition, as both cartoon contours and texture patterns exhibit strong patch recurrence in images. To address this issue, we introduce the isotropy prior of patch recurrence, that the spatial configuration of similar patches in texture exhibits the isotropic structure which is different from that in cartoon, to model the texture component. Based on the isotropic patch recurrence, we construct a nonlocal sparsification system which can effectively distinguish well-patterned features from contour edges. Incorporating the constructed nonlocal system into morphology component analysis, we develop an effective method to both noiseless and noisy cartoon-texture decomposition. The experimental results have demonstrated the superior performance of the proposed method to the existing ones, as well as the effectiveness of the isotropic patch recurrence prior.
Tasks
Published 2018-11-10
URL http://arxiv.org/abs/1811.04208v1
PDF http://arxiv.org/pdf/1811.04208v1.pdf
PWC https://paperswithcode.com/paper/image-cartoon-texture-decomposition-using
Repo
Framework

An Exercise Fatigue Detection Model Based on Machine Learning Methods

Title An Exercise Fatigue Detection Model Based on Machine Learning Methods
Authors Ming-Yen Wu, Chi-Hua Chen, Chi-Chun Lo
Abstract This study proposes an exercise fatigue detection model based on real-time clinical data which includes time domain analysis, frequency domain analysis, detrended fluctuation analysis, approximate entropy, and sample entropy. Furthermore, this study proposed a feature extraction method which is combined with an analytical hierarchy process to analyze and extract critical features. Finally, machine learning algorithms were adopted to analyze the data of each feature for the detection of exercise fatigue. The practical experimental results showed that the proposed exercise fatigue detection model and feature extraction method could precisely detect the level of exercise fatigue, and the accuracy of exercise fatigue detection could be improved up to 98.65%.
Tasks
Published 2018-03-07
URL http://arxiv.org/abs/1803.07952v1
PDF http://arxiv.org/pdf/1803.07952v1.pdf
PWC https://paperswithcode.com/paper/an-exercise-fatigue-detection-model-based-on
Repo
Framework
comments powered by Disqus