January 26, 2020

3357 words 16 mins read

Paper Group ANR 1532

WeNet: Weighted Networks for Recurrent Network Architecture Search. A Strategy for Adaptive Sampling of Multi-fidelity Gaussian Process to Reduce Predictive Uncertainty. Myocardial Infarction Quantification From Late Gadolinium Enhancement MRI Using Top-hat Transforms and Neural Networks. Vision and Language: from Visual Perception to Content Creat …

WeNet: Weighted Networks for Recurrent Network Architecture Search


Title	WeNet: Weighted Networks for Recurrent Network Architecture Search
Authors	Zhiheng Huang, Bing Xiang
Abstract	In recent years, there has been increasing demand for automatic architecture search in deep learning. Numerous approaches have been proposed and led to state-of-the-art results in various applications, including image classification and language modeling. In this paper, we propose a novel way of architecture search by means of weighted networks (WeNet), which consist of a number of networks, with each assigned a weight. These weights are updated with back-propagation to reflect the importance of different networks. Such weighted networks bear similarity to mixture of experts. We conduct experiments on Penn Treebank and WikiText-2. We show that the proposed WeNet can find recurrent architectures which result in state-of-the-art performance.
Tasks	Image Classification, Language Modelling, Neural Architecture Search
Published	2019-04-08
URL	http://arxiv.org/abs/1904.03819v1
PDF	http://arxiv.org/pdf/1904.03819v1.pdf
PWC	https://paperswithcode.com/paper/wenet-weighted-networks-for-recurrent-network
Repo
Framework

A Strategy for Adaptive Sampling of Multi-fidelity Gaussian Process to Reduce Predictive Uncertainty


Title	A Strategy for Adaptive Sampling of Multi-fidelity Gaussian Process to Reduce Predictive Uncertainty
Authors	Sayan Ghosh, Jesper Kristensen, Yiming Zhang, Waad Subber, Liping Wang
Abstract	Multi-fidelity Gaussian process is a common approach to address the extensive computationally demanding algorithms such as optimization, calibration and uncertainty quantification. Adaptive sampling for multi-fidelity Gaussian process is a changing task due to the fact that not only we seek to estimate the next sampling location of the design variable, but also the level of the simulator fidelity. This issue is often addressed by including the cost of the simulator as an another factor in the searching criterion in conjunction with the uncertainty reduction metric. In this work, we extent the traditional design of experiment framework for the multi-fidelity Gaussian process by partitioning the prediction uncertainty based on the fidelity level and the associated cost of execution. In addition, we utilize the concept of Believer which quantifies the effect of adding an exploratory design point on the Gaussian process uncertainty prediction. We demonstrated our framework using academic examples as well as a industrial application of steady-state thermodynamic operation point of a fluidized bed process
Tasks	Calibration
Published	2019-07-26
URL	https://arxiv.org/abs/1907.11739v1
PDF	https://arxiv.org/pdf/1907.11739v1.pdf
PWC	https://paperswithcode.com/paper/a-strategy-for-adaptive-sampling-of-multi
Repo
Framework

Myocardial Infarction Quantification From Late Gadolinium Enhancement MRI Using Top-hat Transforms and Neural Networks


Title	Myocardial Infarction Quantification From Late Gadolinium Enhancement MRI Using Top-hat Transforms and Neural Networks
Authors	Ezequiel de la Rosa, Désiré Sidibé, Thomas Decourselle, Thibault Leclercq, Alexandre Cochet, Alain Lalande
Abstract	Significance: Late gadolinium enhanced magnetic resonance imaging (LGE-MRI) is the gold standard technique for myocardial viability assessment. Although the technique accurately reflects the damaged tissue, there is no clinical standard for quantifying myocardial infarction (MI), demanding most algorithms to be expert dependent. Objectives and Methods: In this work a new automatic method for MI quantification from LGE-MRI is proposed. Our novel segmentation approach is devised for accurately detecting not only hyper-enhanced lesions, but also microvascular-obstructed areas. Moreover, it includes a myocardial disease detection step which extends the algorithm for working under healthy scans. The method is based on a cascade approach where firstly, diseased slices are identified by a convolutional neural network (CNN). Secondly, by means of morphological operations a fast coarse scar segmentation is obtained. Thirdly, the segmentation is refined by a boundary-voxel reclassification strategy using an ensemble of CNNs. For its validation, reproducibility and further comparison against other methods, we tested the method on a big multi-field expert annotated LGE-MRI database including healthy and diseased cases. Results and Conclusion: In an exhaustive comparison against nine reference algorithms, the proposal achieved state-of-the-art segmentation performances and showed to be the only method agreeing in volumetric scar quantification with the expert delineations. Moreover, the method was able to reproduce the intra- and inter-observer variability ranges. It is concluded that the method could suitably be transferred to clinical scenarios.
Tasks
Published	2019-01-09
URL	http://arxiv.org/abs/1901.02911v1
PDF	http://arxiv.org/pdf/1901.02911v1.pdf
PWC	https://paperswithcode.com/paper/myocardial-infarction-quantification-from
Repo
Framework

Vision and Language: from Visual Perception to Content Creation


Title	Vision and Language: from Visual Perception to Content Creation
Authors	Tao Mei, Wei Zhang, Ting Yao
Abstract	Vision and language are two fundamental capabilities of human intelligence. Humans routinely perform tasks through the interactions between vision and language, supporting the uniquely human capacity to talk about what they see or hallucinate a picture on a natural-language description. The valid question of how language interacts with vision motivates us researchers to expand the horizons of computer vision area. In particular, “vision to language” is probably one of the most popular topics in the past five years, with a significant growth in both volume of publications and extensive applications, e.g., captioning, visual question answering, visual dialog, language navigation, etc. Such tasks boost visual perception with more comprehensive understanding and diverse linguistic representations. Going beyond the progresses made in “vision to language,” language can also contribute to vision understanding and offer new possibilities of visual content creation, i.e., “language to vision.” The process performs as a prism through which to create visual content conditioning on the language inputs. This paper reviews the recent advances along these two dimensions: “vision to language” and “language to vision.” More concretely, the former mainly focuses on the development of image/video captioning, as well as typical encoder-decoder structures and benchmarks, while the latter summarizes the technologies of visual content creation. The real-world deployment or services of vision and language are elaborated as well.
Tasks	Question Answering, Video Captioning, Visual Dialog, Visual Question Answering
Published	2019-12-26
URL	https://arxiv.org/abs/1912.11872v1
PDF	https://arxiv.org/pdf/1912.11872v1.pdf
PWC	https://paperswithcode.com/paper/vision-and-language-from-visual-perception-to
Repo
Framework

Efficient Attention Mechanism for Handling All the Interactions between Many Inputs with Application to Visual Dialog


Title	Efficient Attention Mechanism for Handling All the Interactions between Many Inputs with Application to Visual Dialog
Authors	Van-Quang Nguyen, Masanori Suganuma, Takayuki Okatani
Abstract	It has been a primary concern in recent studies of vision and language tasks to design an effective attention mechanism dealing with interactions between the two modalities. The Transformer has recently been extended and applied to several bi-modal tasks, yielding promising results. For visual dialog, it becomes necessary to consider interactions between three or more inputs, i.e., an image, a question, and a dialog history, or even its individual dialog components. In this paper, we present a neural architecture that can efficiently deal with all the interactions between many such inputs. It has a block structure similar to the Transformer and employs the same design of attention computation, whereas it has only a small number of parameters, yet has sufficient representational power for the purpose. Assuming a standard setting of visual dialog, a network built upon the proposed attention block has less than one-tenth of parameters as compared with its counterpart, a natural Transformer extension. We present its application to the visual dialog task. The experimental results validate the effectiveness of the proposed approach, showing improvements of the best NDCG score on the VisDial v1.0 dataset from 57.59 to 60.92 with a single model, from 64.47 to 66.53 with ensemble models, and even to 74.88 with additional finetuning.
Tasks	Visual Dialog
Published	2019-11-26
URL	https://arxiv.org/abs/1911.11390v1
PDF	https://arxiv.org/pdf/1911.11390v1.pdf
PWC	https://paperswithcode.com/paper/efficient-attention-mechanism-for-handling
Repo
Framework

Multi-Step Chord Sequence Prediction Based on Aggregated Multi-Scale Encoder-Decoder Network


Title	Multi-Step Chord Sequence Prediction Based on Aggregated Multi-Scale Encoder-Decoder Network
Authors	Tristan Carsault, Andrew McLeod, Philippe Esling, Jérôme Nika, Eita Nakamura, Kazuyoshi Yoshii
Abstract	This paper studies the prediction of chord progressions for jazz music by relying on machine learning models. The motivation of our study comes from the recent success of neural networks for performing automatic music composition. Although high accuracies are obtained in single-step prediction scenarios, most models fail to generate accurate multi-step chord predictions. In this paper, we postulate that this comes from the multi-scale structure of musical information and propose new architectures based on an iterative temporal aggregation of input labels. Specifically, the input and ground truth labels are merged into increasingly large temporal bags, on which we train a family of encoder-decoder networks for each temporal scale. In a second step, we use these pre-trained encoder bottleneck features at each scale in order to train a final encoder-decoder network. Furthermore, we rely on different reductions of the initial chord alphabet into three adapted chord alphabets. We perform evaluations against several state-of-the-art models and show that our multi-scale architecture outperforms existing methods in terms of accuracy and perplexity, while requiring relatively few parameters. We analyze musical properties of the results, showing the influence of downbeat position within the analysis window on accuracy, and evaluate errors using a musically-informed distance metric.
Tasks
Published	2019-11-12
URL	https://arxiv.org/abs/1911.04972v1
PDF	https://arxiv.org/pdf/1911.04972v1.pdf
PWC	https://paperswithcode.com/paper/multi-step-chord-sequence-prediction-based-on
Repo
Framework

Self-supervised Learning of 3D Objects from Natural Images


Title	Self-supervised Learning of 3D Objects from Natural Images
Authors	Hiroharu Kato, Tatsuya Harada
Abstract	We present a method to learn single-view reconstruction of the 3D shape, pose, and texture of objects from categorized natural images in a self-supervised manner. Since this is a severely ill-posed problem, carefully designing a training method and introducing constraints are essential. To avoid the difficulty of training all elements at the same time, we propose training category-specific base shapes with fixed pose distribution and simple textures first, and subsequently training poses and textures using the obtained shapes. Another difficulty is that shapes and backgrounds sometimes become excessively complicated to mistakenly reconstruct textures on object surfaces. To suppress it, we propose using strong regularization and constraints on object surfaces and background images. With these two techniques, we demonstrate that we can use natural image collections such as CIFAR-10 and PASCAL objects for training, which indicates the possibility to realize 3D object reconstruction on diverse object categories beyond synthetic datasets.
Tasks	3D Object Reconstruction, Object Reconstruction
Published	2019-11-20
URL	https://arxiv.org/abs/1911.08850v1
PDF	https://arxiv.org/pdf/1911.08850v1.pdf
PWC	https://paperswithcode.com/paper/self-supervised-learning-of-3d-objects-from
Repo
Framework

Two Causal Principles for Improving Visual Dialog


Title	Two Causal Principles for Improving Visual Dialog
Authors	Jiaxin Qi, Yulei Niu, Jianqiang Huang, Hanwang Zhang
Abstract	This paper unravels the design tricks adopted by us, the champion team MReaL-BDAI, for Visual Dialog Challenge 2019: two causal principles for improving Visual Dialog (VisDial). By “improving”, we mean that they can promote almost every existing VisDial model to the state-of-the-art performance on the leader-board. Such a major improvement is only due to our careful inspection on the causality behind the model and data, finding that the community has overlooked two causalities in VisDial. Intuitively, Principle 1 suggests: we should remove the direct input of the dialog history to the answer model, otherwise a harmful shortcut bias will be introduced; Principle 2 says: there is an unobserved confounder for history, question, and answer, leading to spurious correlations from training data. In particular, to remove the confounder suggested in Principle 2, we propose several causal intervention algorithms, which make the training fundamentally different from the traditional likelihood estimation. Note that the two principles are model-agnostic, so they are applicable in any VisDial model. The code is available at https://github.com/simpleshinobu/visdial-principles.
Tasks	Visual Dialog
Published	2019-11-24
URL	https://arxiv.org/abs/1911.10496v2
PDF	https://arxiv.org/pdf/1911.10496v2.pdf
PWC	https://paperswithcode.com/paper/two-causal-principles-for-improving-visual
Repo
Framework

Simplified_edition_Multi-robot SLAM Multi-view Target Tracking based on Panoramic Vision in Irregular Environment


Title	Simplified_edition_Multi-robot SLAM Multi-view Target Tracking based on Panoramic Vision in Irregular Environment
Authors	R. Q. Wang, Z. Q. Yuan, G. H. Chen
Abstract	In order to improve the precision of multi-robot SLAM multi-view target tracking process, a improved multi-robot SLAM multi-view target tracking algorithm based on panoramic vision in irregular environment was put forward, adding an correction factor to renew the existing Extended Kalman Filter (EKF) model, obtaining new coordinates X and Y after twice iterations. The paper has been accepted by Computing and Visualization in Science and this is a simplified version.
Tasks
Published	2019-11-22
URL	https://arxiv.org/abs/1911.09918v1
PDF	https://arxiv.org/pdf/1911.09918v1.pdf
PWC	https://paperswithcode.com/paper/simplified_edition_multi-robot-slam-multi
Repo
Framework

Linguistic Knowledge and Transferability of Contextual Representations


Title	Linguistic Knowledge and Transferability of Contextual Representations
Authors	Nelson F. Liu, Matt Gardner, Yonatan Belinkov, Matthew E. Peters, Noah A. Smith
Abstract	Contextual word representations derived from large-scale neural language models are successful across a diverse set of NLP tasks, suggesting that they encode useful and transferable features of language. To shed light on the linguistic knowledge they capture, we study the representations produced by several recent pretrained contextualizers (variants of ELMo, the OpenAI transformer language model, and BERT) with a suite of seventeen diverse probing tasks. We find that linear models trained on top of frozen contextual representations are competitive with state-of-the-art task-specific models in many cases, but fail on tasks requiring fine-grained linguistic knowledge (e.g., conjunct identification). To investigate the transferability of contextual word representations, we quantify differences in the transferability of individual layers within contextualizers, especially between recurrent neural networks (RNNs) and transformers. For instance, higher layers of RNNs are more task-specific, while transformer layers do not exhibit the same monotonic trend. In addition, to better understand what makes contextual word representations transferable, we compare language model pretraining with eleven supervised pretraining tasks. For any given task, pretraining on a closely related task yields better performance than language model pretraining (which is better on average) when the pretraining dataset is fixed. However, language model pretraining on more data gives the best results.
Tasks	Language Modelling
Published	2019-03-21
URL	http://arxiv.org/abs/1903.08855v5
PDF	http://arxiv.org/pdf/1903.08855v5.pdf
PWC	https://paperswithcode.com/paper/linguistic-knowledge-and-transferability-of
Repo
Framework

Toward 3D Object Reconstruction from Stereo Images


Title	Toward 3D Object Reconstruction from Stereo Images
Authors	Haozhe Xie, Hongxun Yao, Shangchen Zhou, Shengping Zhang, Xiaoshuai Sun, Wenxiu Sun
Abstract	Inferring the 3D shape of an object from an RGB image has shown impressive results, however, existing methods rely primarily on recognizing the most similar 3D model from the training set to solve the problem. These methods suffer from poor generalization and may lead to low-quality reconstructions for unseen objects. Nowadays, stereo cameras are pervasive in emerging devices such as dual-lens smartphones and robots, which enables the use of the two-view nature of stereo images to explore the 3D structure and thus improve the reconstruction performance. In this paper, we propose a new deep learning framework for reconstructing the 3D shape of an object from a pair of stereo images, which reasons about the 3D structure of the object by taking bidirectional disparities and feature correspondences between the two views into account. Besides, we present a large-scale synthetic benchmarking dataset, namely StereoShapeNet, containing 1,052,976 pairs of stereo images rendered from ShapeNet along with the corresponding bidirectional depth and disparity maps. Experimental results on the StereoShapeNet benchmark demonstrate that the proposed framework outperforms the state-of-the-art methods.
Tasks	3D Object Reconstruction, Object Reconstruction
Published	2019-10-18
URL	https://arxiv.org/abs/1910.08223v2
PDF	https://arxiv.org/pdf/1910.08223v2.pdf
PWC	https://paperswithcode.com/paper/toward-3d-object-reconstruction-from-stereo
Repo
Framework

A Multi-Strategy Approach to Overcoming Bias in Community Detection Evaluation


Title	A Multi-Strategy Approach to Overcoming Bias in Community Detection Evaluation
Authors	Jeancarlo Campos Leão, Alberto H. F. Laender, Pedro O. S. Vaz de Melo
Abstract	Community detection is key to understand the structure of complex networks. However, the lack of appropriate evaluation strategies for this specific task may produce biased and incorrect results that might invalidate further analyses or applications based on such networks. In this context, the main contribution of this paper is an approach that supports a robust quality evaluation when detecting communities in real-world networks. In our approach, we use multiple strategies that capture distinct aspects of the communities. The conclusion on the quality of these communities is based on the consensus among the strategies adopted for the structural evaluation, as well as on the comparison with communities detected by different methods and with their existing ground truths. In this way, our approach allows one to overcome biases in network data, detection algorithms and evaluation metrics, thus providing more consistent conclusions about the quality of the detected communities. Experiments conducted with several real and synthetic networks provided results that show the effectiveness of our approach.
Tasks	Community Detection
Published	2019-09-21
URL	https://arxiv.org/abs/1909.09903v1
PDF	https://arxiv.org/pdf/1909.09903v1.pdf
PWC	https://paperswithcode.com/paper/190909903
Repo
Framework

Shrinking the Upper Confidence Bound: A Dynamic Product Selection Problem for Urban Warehouses


Title	Shrinking the Upper Confidence Bound: A Dynamic Product Selection Problem for Urban Warehouses
Authors	Rong Jin, David Simchi-Levi, Li Wang, Xinshang Wang, Sen Yang
Abstract	The recent rising popularity of ultra-fast delivery services on retail platforms fuels the increasing use of urban warehouses, whose proximity to customers makes fast deliveries viable. The space limit in urban warehouses poses a problem for the online retailers: the number of products (SKUs) they carry is no longer “the more, the better”, yet it can still be significantly large, reaching hundreds or thousands in a product category. In this paper, we study algorithms for dynamically identifying a large number of products (i.e., SKUs) with top customer purchase probabilities on the fly, from an ocean of potential products to offer on retailers’ ultra-fast delivery platforms. We distill the product selection problem into a semi-bandit model with linear generalization. There are in total $N$ different arms, each with a feature vector of dimension $d$. The player pulls $K$ arms in each period and observes the bandit feedback from each of the pulled arms. We focus on the setting where $K$ is much greater than the number of total time periods $T$ or the dimension of product features $d$. We first analyze a standard UCB algorithm and show its regret bound can be expressed as the sum of a $T$-independent part $\tilde O(K d^{3/2})$ and a $T$-dependent part $\tilde O(d\sqrt{KT})$, which we refer to as “fixed cost” and “variable cost” respectively. To reduce the fixed cost for large $K$ values, we propose a novel online learning algorithm, which iteratively shrinks the upper confidence bounds within each period, and show its fixed cost is reduced by a factor of $d$ to $\tilde O(K \sqrt{d})$. Moreover, we test the algorithms on an industrial dataset from Alibaba Group. Experimental results show that our new algorithm reduces the total regret of the standard UCB algorithm by at least 10%.
Tasks
Published	2019-03-19
URL	https://arxiv.org/abs/1903.07844v2
PDF	https://arxiv.org/pdf/1903.07844v2.pdf
PWC	https://paperswithcode.com/paper/conservative-exploration-for-semi-bandits
Repo
Framework

Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient


Title	Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient
Authors	Kevin Sebastian Luck, Mel Vecerik, Simon Stepputtis, Heni Ben Amor, Jonathan Scholz
Abstract	Model-free reinforcement learning algorithms such as Deep Deterministic Policy Gradient (DDPG) often require additional exploration strategies, especially if the actor is of deterministic nature. This work evaluates the use of model-based trajectory optimization methods used for exploration in Deep Deterministic Policy Gradient when trained on a latent image embedding. In addition, an extension of DDPG is derived using a value function as critic, making use of a learned deep dynamics model to compute the policy gradient. This approach leads to a symbiotic relationship between the deep reinforcement learning algorithm and the latent trajectory optimizer. The trajectory optimizer benefits from the critic learned by the RL algorithm and the latter from the enhanced exploration generated by the planner. The developed methods are evaluated on two continuous control tasks, one in simulation and one in the real world. In particular, a Baxter robot is trained to perform an insertion task, while only receiving sparse rewards and images as observations from the environment.
Tasks	Continuous Control
Published	2019-11-15
URL	https://arxiv.org/abs/1911.06833v1
PDF	https://arxiv.org/pdf/1911.06833v1.pdf
PWC	https://paperswithcode.com/paper/improved-exploration-through-latent
Repo
Framework

A New Method for Atlanta World Frame Estimation


Title	A New Method for Atlanta World Frame Estimation
Authors	Yinlong Liu, Alois Knoll, Guang Chen
Abstract	In this paper, we propose a new Atlanta frame estimation method by considering the relationship between vertical direction and horizontal directions. Unlike previous solutions, our method does not solve all the directions at one time. On the contrary, it estimates the directions sequentially. Concretely, our method first searches the vertical direction in $\mathbb{S}^2$ globally, then estimates the horizontal directions in one-dimension. As a consequence, the dimensionality of each subproblem problem is low and it can be solved efficiently. In other words, the running time of our method will not greatly increase as the number of horizontal directions increases. The advantages of our method are validated via testing on both synthetic and real-world data.
Tasks
Published	2019-04-29
URL	http://arxiv.org/abs/1904.12717v1
PDF	http://arxiv.org/pdf/1904.12717v1.pdf
PWC	https://paperswithcode.com/paper/a-new-method-for-atlanta-world-frame
Repo
Framework