January 29, 2020

3054 words 15 mins read

Paper Group ANR 664

Paper Group ANR 664

Learning a Family of Optimal State-Feedback Controllers with Homotopy and Trajectory Optimisation. A Semi-Supervised Framework for Automatic Pixel-Wise Breast Cancer Grading of Histological Images. State-Regularized Recurrent Neural Networks. Embedding of FRPN in CNN architecture. DDNet: Cartesian-polar Dual-domain Network for the Joint Optic Disc …

Learning a Family of Optimal State-Feedback Controllers with Homotopy and Trajectory Optimisation

Title Learning a Family of Optimal State-Feedback Controllers with Homotopy and Trajectory Optimisation
Authors Christopher Iliffe Sprague, Dario Izzo, Petter Ögren
Abstract Optimal state-feedback controllers, capable of changing between different objective functions, are advantageous to systems in which unexpected situations may arise. However, synthesising such controllers, even for a single objective, is a demanding process. In this paper, we present a novel and straightforward approach to synthesising these policies through a combination of trajectory optimisation, homotopy continuation, and imitation learning. We use numerical continuation to efficiently generate optimal demonstrations across several objectives and boundary conditions, and use these to train our policies. Additionally, we demonstrate the ability of our policies to effectively learn families of optimal state-feedback controllers, which can be used to change objective function online. We illustrate this approach across two trajectory optimisation problems, an inverted pendulum swingup and a spacecraft orbit transfer, and show that the synthesised policies, when evaluated in simulation, produce trajectories that are near-optimal. These results indicate the benefit of trajectory optimisation and homotopy continuation to the synthesis of controllers in dynamic-objective contexts.
Tasks Imitation Learning
Published 2019-02-27
URL https://arxiv.org/abs/1902.10139v2
PDF https://arxiv.org/pdf/1902.10139v2.pdf
PWC https://paperswithcode.com/paper/learning-a-family-of-optimal-state-feedback
Repo
Framework

A Semi-Supervised Framework for Automatic Pixel-Wise Breast Cancer Grading of Histological Images

Title A Semi-Supervised Framework for Automatic Pixel-Wise Breast Cancer Grading of Histological Images
Authors Yanyuet Man, Xiangyun Ding, Xingcheng Yao, Han Bao
Abstract Throughout the world, breast cancer is one of the leading causes of female death. Recently, deep learning methods are developed to automatically grade breast cancer of histological slides. However, the performance of existing deep learning models is limited due to the lack of large annotated biomedical datasets. One promising way to relieve the annotating burden is to leverage the unannotated datasets to enhance the trained model. In this paper, we first apply active learning method in breast cancer grading, and propose a semi-supervised framework based on expectation maximization (EM) model. The proposed EM approach is based on the collaborative filtering among the annotated and unannotated datasets. The collaborative filtering method effectively extracts useful and credible datasets from the unannotated images. Results of pixel-wise prediction of whole-slide images (WSI) demonstrate that the proposed method not only outperforms state-of-art methods, but also significantly reduces the annotation cost by over 70%.
Tasks Active Learning
Published 2019-07-03
URL https://arxiv.org/abs/1907.01696v1
PDF https://arxiv.org/pdf/1907.01696v1.pdf
PWC https://paperswithcode.com/paper/a-semi-supervised-framework-for-automatic
Repo
Framework

State-Regularized Recurrent Neural Networks

Title State-Regularized Recurrent Neural Networks
Authors Cheng Wang, Mathias Niepert
Abstract Recurrent neural networks are a widely used class of neural architectures. They have, however, two shortcomings. First, it is difficult to understand what exactly they learn. Second, they tend to work poorly on sequences requiring long-term memorization, despite having this capacity in principle. We aim to address both shortcomings with a class of recurrent networks that use a stochastic state transition mechanism between cell applications. This mechanism, which we term state-regularization, makes RNNs transition between a finite set of learnable states. We evaluate state-regularized RNNs on (1) regular languages for the purpose of automata extraction; (2) nonregular languages such as balanced parentheses, palindromes, and the copy task where external memory is required; and (3) real-word sequence learning tasks for sentiment analysis, visual object recognition, and language modeling. We show that state-regularization (a) simplifies the extraction of finite state automata modeling an RNN’s state transition dynamics; (b) forces RNNs to operate more like automata with external memory and less like finite state machines; (c) makes RNNs have better interpretability and explainability.
Tasks Language Modelling, Object Recognition, Sentiment Analysis
Published 2019-01-25
URL https://arxiv.org/abs/1901.08817v2
PDF https://arxiv.org/pdf/1901.08817v2.pdf
PWC https://paperswithcode.com/paper/state-regularized-recurrent-neural-networks
Repo
Framework

Embedding of FRPN in CNN architecture

Title Embedding of FRPN in CNN architecture
Authors Alberto Rossi, Markus Hagenbuchner, Franco Scarselli, Ah Chung Tsoi
Abstract This paper extends the fully recursive perceptron network (FRPN) model for vectorial inputs to include deep convolutional neural networks (CNNs) which can accept multi-dimensional inputs. A FRPN consists of a recursive layer, which, given a fixed input, iteratively computes an equilibrium state. The unfolding realized with this kind of iterative mechanism allows to simulate a deep neural network with any number of layers. The extension of the FRPN to CNN results in an architecture, which we call convolutional-FRPN (C-FRPN), where the convolutional layers are recursive. The method is evaluated on several image classification benchmarks. It is shown that the C-FRPN consistently outperforms standard CNNs having the same number of parameters. The gap in performance is particularly large for small networks, showing that the C-FRPN is a very powerful architecture, since it allows to obtain equivalent performance with fewer parameters when compared with deep CNNs.
Tasks Image Classification
Published 2019-12-27
URL https://arxiv.org/abs/2001.05851v1
PDF https://arxiv.org/pdf/2001.05851v1.pdf
PWC https://paperswithcode.com/paper/embedding-of-frpn-in-cnn-architecture
Repo
Framework

DDNet: Cartesian-polar Dual-domain Network for the Joint Optic Disc and Cup Segmentation

Title DDNet: Cartesian-polar Dual-domain Network for the Joint Optic Disc and Cup Segmentation
Authors Qing Liu, Xiaopeng Hong, Wei Ke, Zailiang Chen, Beiji Zou
Abstract Existing joint optic disc and cup segmentation approaches are developed either in Cartesian or polar coordinate system. However, due to the subtle optic cup, the contextual information exploited from the single domain even by the prevailing CNNs is still insufficient. In this paper, we propose a novel segmentation approach, named Cartesian-polar dual-domain network (DDNet), which for the first time considers the complementary of the Cartesian domain and the polar domain. We propose a two-branch of domain feature encoder and learn translation equivariant representations on rectilinear grid from Cartesian domain and rotation equivariant representations on polar grid from polar domain parallelly. To fuse the features on two different grids, we propose a dual-domain fusion module. This module builds the correspondence between two grids by the differentiable polar transform layer and learns the feature importance across two domains in element-wise to enhance the expressive capability. Finally, the decoder aggregates the fused features from low-level to high-level and makes dense predictions. We validate the state-of-the-art segmentation performances of our DDNet on the public dataset ORIGA. According to the segmentation masks, we estimate the commonly used clinical measure for glaucoma, i.e., the vertical cup-to-disc ratio. The low cup-to-disc ratio estimation error demonstrates the potential application in glaucoma screening.
Tasks Feature Importance
Published 2019-04-18
URL http://arxiv.org/abs/1904.08773v1
PDF http://arxiv.org/pdf/1904.08773v1.pdf
PWC https://paperswithcode.com/paper/ddnet-cartesian-polar-dual-domain-network-for
Repo
Framework

Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment

Title Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment
Authors Samyak Datta, Karan Sikka, Anirban Roy, Karuna Ahuja, Devi Parikh, Ajay Divakaran
Abstract We address the problem of grounding free-form textual phrases by using weak supervision from image-caption pairs. We propose a novel end-to-end model that uses caption-to-image retrieval as a downstream' task to guide the process of phrase localization. Our method, as a first step, infers the latent correspondences between regions-of-interest (RoIs) and phrases in the caption and creates a discriminative image representation using these matched RoIs. In a subsequent step, this (learned) representation is aligned with the caption. Our key contribution lies in building this caption-conditioned’ image encoding which tightly couples both the tasks and allows the weak supervision to effectively guide visual grounding. We provide an extensive empirical and qualitative analysis to investigate the different components of our proposed model and compare it with competitive baselines. For phrase localization, we report an improvement of 4.9% (absolute) over the prior state-of-the-art on the VisualGenome dataset. We also report results that are at par with the state-of-the-art on the downstream caption-to-image retrieval task on COCO and Flickr30k datasets.
Tasks Image Retrieval, Phrase Grounding
Published 2019-03-27
URL https://arxiv.org/abs/1903.11649v2
PDF https://arxiv.org/pdf/1903.11649v2.pdf
PWC https://paperswithcode.com/paper/align2ground-weakly-supervised-phrase
Repo
Framework

Post-editing Productivity with Neural Machine Translation: An Empirical Assessment of Speed and Quality in the Banking and Finance Domain

Title Post-editing Productivity with Neural Machine Translation: An Empirical Assessment of Speed and Quality in the Banking and Finance Domain
Authors Samuel Läubli, Chantal Amrhein, Patrick Düggelin, Beatriz Gonzalez, Alena Zwahlen, Martin Volk
Abstract Neural machine translation (NMT) has set new quality standards in automatic translation, yet its effect on post-editing productivity is still pending thorough investigation. We empirically test how the inclusion of NMT, in addition to domain-specific translation memories and termbases, impacts speed and quality in professional translation of financial texts. We find that even with language pairs that have received little attention in research settings and small amounts of in-domain data for system adaptation, NMT post-editing allows for substantial time savings and leads to equal or slightly better quality.
Tasks Machine Translation
Published 2019-06-04
URL https://arxiv.org/abs/1906.01685v1
PDF https://arxiv.org/pdf/1906.01685v1.pdf
PWC https://paperswithcode.com/paper/post-editing-productivity-with-neural-machine
Repo
Framework

Conformal Prediction Interval Estimations with an Application to Day-Ahead and Intraday Power Markets

Title Conformal Prediction Interval Estimations with an Application to Day-Ahead and Intraday Power Markets
Authors Christopher Kath, Florian Ziel
Abstract We discuss a concept denoted as Conformal Prediction (CP) in this paper. While initially stemming from the world of machine learning, it was never applied or analyzed in the context of short-term electricity price forecasting. Therefore, we elaborate the aspects that render Conformal Prediction worthwhile to know and explain why its simple yet very efficient idea has worked in other fields of application and why its characteristics are promising for short-term power applications as well. We compare its performance with different state-of-the-art electricity price forecasting models such as quantile regression averaging (QRA) in an empirical out-of-sample study for three short-term electricity time series. We combine Conformal Prediction with various underlying point forecast models to demonstrate its versatility and behavior under changing conditions. Our findings suggest that Conformal Prediction yields sharp and reliable prediction intervals in short-term power markets. We further inspect the effect each of Conformal Prediction’s model components has and provide a path-based guideline on how to find the best CP model for each market.
Tasks Time Series
Published 2019-05-20
URL https://arxiv.org/abs/1905.07886v1
PDF https://arxiv.org/pdf/1905.07886v1.pdf
PWC https://paperswithcode.com/paper/conformal-prediction-interval-estimations
Repo
Framework

Differentiable Bayesian Neural Network Inference for Data Streams

Title Differentiable Bayesian Neural Network Inference for Data Streams
Authors Namuk Park, Taekyu Lee, Songkuk Kim
Abstract While deep neural networks (NNs) do not provide the confidence of its prediction, Bayesian neural network (BNN) can estimate the uncertainty of the prediction. However, BNNs have not been widely used in practice due to the computational cost of inference. This prohibitive computational cost is a hindrance especially when processing stream data with low-latency. To address this problem, we propose a novel model which approximate BNNs for data streams. Instead of generating separate prediction for each data sample independently, this model estimates the increments of prediction for a new data sample from the previous predictions. The computational cost of this model is almost the same as that of non-Bayesian NNs. Experiments with semantic segmentation on real-world data show that this model performs significantly faster than BNNs, estimating uncertainty comparable to the results of BNNs.
Tasks Semantic Segmentation
Published 2019-07-12
URL https://arxiv.org/abs/1907.05911v1
PDF https://arxiv.org/pdf/1907.05911v1.pdf
PWC https://paperswithcode.com/paper/differentiable-bayesian-neural-network
Repo
Framework

Singing voice conversion with non-parallel data

Title Singing voice conversion with non-parallel data
Authors Xin Chen, Wei Chu, Jinxi Guo, Ning Xu
Abstract Singing voice conversion is a task to convert a song sang by a source singer to the voice of a target singer. In this paper, we propose using a parallel data free, many-to-one voice conversion technique on singing voices. A phonetic posterior feature is first generated by decoding singing voices through a robust Automatic Speech Recognition Engine (ASR). Then, a trained Recurrent Neural Network (RNN) with a Deep Bidirectional Long Short Term Memory (DBLSTM) structure is used to model the mapping from person-independent content to the acoustic features of the target person. F0 and aperiodic are obtained through the original singing voice, and used with acoustic features to reconstruct the target singing voice through a vocoder. In the obtained singing voice, the targeted and sourced singers sound similar. To our knowledge, this is the first study that uses non parallel data to train a singing voice conversion system. Subjective evaluations demonstrate that the proposed method effectively converts singing voices.
Tasks Speech Recognition, Voice Conversion
Published 2019-03-11
URL http://arxiv.org/abs/1903.04124v1
PDF http://arxiv.org/pdf/1903.04124v1.pdf
PWC https://paperswithcode.com/paper/singing-voice-conversion-with-non-parallel
Repo
Framework

Deep Cooking: Predicting Relative Food Ingredient Amounts from Images

Title Deep Cooking: Predicting Relative Food Ingredient Amounts from Images
Authors Jiatong Li, Ricardo Guerrero, Vladimir Pavlovic
Abstract In this paper, we study the novel problem of not only predicting ingredients from a food image, but also predicting the relative amounts of the detected ingredients. We propose two prediction-based models using deep learning that output sparse and dense predictions, coupled with important semi-automatic multi-database integrative data pre-processing, to solve the problem. Experiments on a dataset of recipes collected from the Internet show the models generate encouraging experimental results.
Tasks
Published 2019-09-26
URL https://arxiv.org/abs/1910.00100v1
PDF https://arxiv.org/pdf/1910.00100v1.pdf
PWC https://paperswithcode.com/paper/deep-cooking-predicting-relative-food
Repo
Framework

View Invariant 3D Human Pose Estimation

Title View Invariant 3D Human Pose Estimation
Authors Guoqiang Wei, Cuiling Lan, Wenjun Zeng, Zhibo Chen
Abstract The recent success of deep networks has significantly advanced 3D human pose estimation from 2D images. The diversity of capturing viewpoints and the flexibility of the human poses, however, remain some significant challenges. In this paper, we propose a view invariant 3D human pose estimation module to alleviate the effects of viewpoint diversity. The framework consists of a base network, which provides an initial estimation of a 3D pose, a view-invariant hierarchical correction network (VI-HC) on top of that to learn the 3D pose refinement under consistent views, and a view-invariant discriminative network (VID) to enforce high-level constraints over body configurations. In VI-HC, the initial 3D pose inputs are automatically transformed to consistent views for further refinements at the global body and local body parts level, respectively. For the VID, under consistent viewpoints, we use adversarial learning to differentiate between estimated poses and real poses to avoid implausible 3D poses. Experimental results demonstrate that the consistent viewpoints can dramatically enhance the performance. Our module shows robustness for different 3D pose base networks and achieves a significant improvement (about 9%) over a powerful baseline on the public 3D pose estimation benchmark Human3.6M.
Tasks 3D Human Pose Estimation, 3D Pose Estimation, Pose Estimation
Published 2019-01-30
URL http://arxiv.org/abs/1901.10841v1
PDF http://arxiv.org/pdf/1901.10841v1.pdf
PWC https://paperswithcode.com/paper/view-invariant-3d-human-pose-estimation
Repo
Framework

Unleashing the Unused Potential of I-Vectors Enabled by GPU Acceleration

Title Unleashing the Unused Potential of I-Vectors Enabled by GPU Acceleration
Authors Ville Vestman, Kong Aik Lee, Tomi H. Kinnunen, Takafumi Koshinaka
Abstract Speaker embeddings are continuous-value vector representations that allow easy comparison between voices of speakers with simple geometric operations. Among others, i-vector and x-vector have emerged as the mainstream methods for speaker embedding. In this paper, we illustrate the use of modern computation platform to harness the benefit of GPU acceleration for i-vector extraction. In particular, we achieve an acceleration of 3000 times in frame posterior computation compared to real time and 25 times in training the i-vector extractor compared to the CPU baseline from Kaldi toolkit. This significant speed-up allows the exploration of ideas that were hitherto impossible. In particular, we show that it is beneficial to update the universal background model (UBM) and re-compute frame alignments while training the i-vector extractor. Additionally, we are able to study different variations of i-vector extractors more rigorously than before. In this process, we reveal some undocumented details of Kaldi’s i-vector extractor and show that it outperforms the standard formulation by a margin of 1 to 2% when tested with VoxCeleb speaker verification protocol. All of our findings are asserted by ensemble averaging the results from multiple runs with random start.
Tasks Speaker Verification
Published 2019-06-20
URL https://arxiv.org/abs/1906.08556v1
PDF https://arxiv.org/pdf/1906.08556v1.pdf
PWC https://paperswithcode.com/paper/unleashing-the-unused-potential-of-i-vectors
Repo
Framework

The Binary Space Partitioning-Tree Process

Title The Binary Space Partitioning-Tree Process
Authors Xuhui Fan, Bin Li, Scott Anthony Sisson
Abstract The Mondrian process represents an elegant and powerful approach for space partition modelling. However, as it restricts the partitions to be axis-aligned, its modelling flexibility is limited. In this work, we propose a self-consistent Binary Space Partitioning (BSP)-Tree process to generalize the Mondrian process. The BSP-Tree process is an almost surely right continuous Markov jump process that allows uniformly distributed oblique cuts in a two-dimensional convex polygon. The BSP-Tree process can also be extended using a non-uniform probability measure to generate direction differentiated cuts. The process is also self-consistent, maintaining distributional invariance under a restricted subdomain. We use Conditional-Sequential Monte Carlo for inference using the tree structure as the high-dimensional variable. The BSP-Tree process’s performance on synthetic data partitioning and relational modelling demonstrates clear inferential improvements over the standard Mondrian process and other related methods.
Tasks
Published 2019-03-22
URL http://arxiv.org/abs/1903.09343v1
PDF http://arxiv.org/pdf/1903.09343v1.pdf
PWC https://paperswithcode.com/paper/the-binary-space-partitioning-tree-process
Repo
Framework

Relation learning in a neurocomputational architecture supports cross-domain transfer

Title Relation learning in a neurocomputational architecture supports cross-domain transfer
Authors Leonidas A. A. Doumas, Guillermo Puebla, Andrea E. Martin, John E. Hummel
Abstract People readily generalise prior knowledge to novel situations and stimuli. Advances in machine learning and artificial intelligence have begun to approximate and even surpass human performance in specific domains, but machine learning systems struggle to generalise information to untrained situations. We present and model that demonstrates human-like extrapolatory generalisation by learning and explicitly representing an open-ended set of relations characterising regularities within the domains it is exposed to. First, when trained to play one video game (e.g., Breakout). the model generalises to a new game (e.g., Pong) with different rules, dimensions, and characteristics in a single shot. Second, the model can learn representations from a different domain (e.g., 3D shape images) that support learning a video game and generalising to a new game in one shot. By exploiting well-established principles from cognitive psychology and neuroscience, the model learns structured representations without feedback, and without requiring knowledge of the relevant relations to be given a priori. We present additional simulations showing that the representations that the model learns support cross-domain generalisation. The model’s ability to generalise between different games demonstrates the flexible generalisation afforded by a capacity to learn not only statistical relations, but also other relations that are useful for characterising the domain to be learned. In turn, this kind of flexible, relational generalisation is only possible because the model is capable of representing relations explicitly, a capacity that is notably absent in extant statistical machine learning algorithms.
Tasks
Published 2019-10-11
URL https://arxiv.org/abs/1910.05065v1
PDF https://arxiv.org/pdf/1910.05065v1.pdf
PWC https://paperswithcode.com/paper/relation-learning-in-a-neurocomputational
Repo
Framework
comments powered by Disqus