October 19, 2019

2928 words 14 mins read

Paper Group ANR 222

Multi-lingual Common Semantic Space Construction via Cluster-consistent Word Embedding. Markerless Visual Robot Programming by Demonstration. Towards Spectral Estimation from a Single RGB Image in the Wild. Visual Question Reasoning on General Dependency Tree. Multi-View Frame Reconstruction with Conditional GAN. Category Trees. Use Of Vapnik-Cherv …

Multi-lingual Common Semantic Space Construction via Cluster-consistent Word Embedding


Title	Multi-lingual Common Semantic Space Construction via Cluster-consistent Word Embedding
Authors	Lifu Huang, Kyunghyun Cho, Boliang Zhang, Heng Ji, Kevin Knight
Abstract	We construct a multilingual common semantic space based on distributional semantics, where words from multiple languages are projected into a shared space to enable knowledge and resource transfer across languages. Beyond word alignment, we introduce multiple cluster-level alignments and enforce the word clusters to be consistently distributed across multiple languages. We exploit three signals for clustering: (1) neighbor words in the monolingual word embedding space; (2) character-level information; and (3) linguistic properties (e.g., apposition, locative suffix) derived from linguistic structure knowledge bases available for thousands of languages. We introduce a new cluster-consistent correlational neural network to construct the common semantic space by aligning words as well as clusters. Intrinsic evaluation on monolingual and multilingual QVEC tasks shows our approach achieves significantly higher correlation with linguistic features than state-of-the-art multi-lingual embedding learning methods do. Using low-resource language name tagging as a case study for extrinsic evaluation, our approach achieves up to 24.5% absolute F-score gain over the state of the art.
Tasks	Word Alignment
Published	2018-04-21
URL	http://arxiv.org/abs/1804.07875v1
PDF	http://arxiv.org/pdf/1804.07875v1.pdf
PWC	https://paperswithcode.com/paper/multi-lingual-common-semantic-space
Repo
Framework

Markerless Visual Robot Programming by Demonstration


Title	Markerless Visual Robot Programming by Demonstration
Authors	Raphael Memmesheimer, Ivanna Mykhalchyshyna, Viktor Seib, Nick Theisen, Dietrich Paulus
Abstract	In this paper we present an approach for learning to imitate human behavior on a semantic level by markerless visual observation. We analyze a set of spatial constraints on human pose data extracted using convolutional pose machines and object informations extracted from 2D image sequences. A scene analysis, based on an ontology of objects and affordances, is combined with continuous human pose estimation and spatial object relations. Using a set of constraints we associate the observed human actions with a set of executable robot commands. We demonstrate our approach in a kitchen task, where the robot learns to prepare a meal.
Tasks	Pose Estimation
Published	2018-07-30
URL	http://arxiv.org/abs/1807.11541v1
PDF	http://arxiv.org/pdf/1807.11541v1.pdf
PWC	https://paperswithcode.com/paper/markerless-visual-robot-programming-by
Repo
Framework

Towards Spectral Estimation from a Single RGB Image in the Wild


Title	Towards Spectral Estimation from a Single RGB Image in the Wild
Authors	Berk Kaya, Yigit Baran Can, Radu Timofte
Abstract	In contrast to the current literature, we address the problem of estimating the spectrum from a single common trichromatic RGB image obtained under unconstrained settings (e.g. unknown camera parameters, unknown scene radiance, unknown scene contents). For this we use a reference spectrum as provided by a hyperspectral image camera, and propose efficient deep learning solutions for sensitivity function estimation and spectral reconstruction from a single RGB image. We further expand the concept of spectral reconstruction such that to work for RGB images taken in the wild and propose a solution based on a convolutional network conditioned on the estimated sensitivity function. Besides the proposed solutions, we study also generic and sensitivity specialized models and discuss their limitations. We achieve state-of-the-art competitive results on the standard example-based spectral reconstruction benchmarks: ICVL, CAVE, NUS and NTIRE. Moreover, our experiments show that, for the first time, accurate spectral estimation from a single RGB image in the wild is within our reach.
Tasks	Spectral Estimation From A Single Rgb Image
Published	2018-12-03
URL	http://arxiv.org/abs/1812.00805v1
PDF	http://arxiv.org/pdf/1812.00805v1.pdf
PWC	https://paperswithcode.com/paper/towards-spectral-estimation-from-a-single-rgb
Repo
Framework

Visual Question Reasoning on General Dependency Tree


Title	Visual Question Reasoning on General Dependency Tree
Authors	Qingxing Cao, Xiaodan Liang, Bailing Li, Guanbin Li, Liang Lin
Abstract	The collaborative reasoning for understanding each image-question pair is very critical but under-explored for an interpretable Visual Question Answering (VQA) system. Although very recent works also tried the explicit compositional processes to assemble multiple sub-tasks embedded in the questions, their models heavily rely on the annotations or hand-crafted rules to obtain valid reasoning layout, leading to either heavy labor or poor performance on composition reasoning. In this paper, to enable global context reasoning for better aligning image and language domains in diverse and unrestricted cases, we propose a novel reasoning network called Adversarial Composition Modular Network (ACMN). This network comprises of two collaborative modules: i) an adversarial attention module to exploit the local visual evidence for each word parsed from the question; ii) a residual composition module to compose the previously mined evidence. Given a dependency parse tree for each question, the adversarial attention module progressively discovers salient regions of one word by densely combining regions of child word nodes in an adversarial manner. Then residual composition module merges the hidden representations of an arbitrary number of children through sum pooling and residual connection. Our ACMN is thus capable of building an interpretable VQA system that gradually dives the image cues following a question-driven reasoning route and makes global reasoning by incorporating the learned knowledge of all attention modules in a principled manner. Experiments on relational datasets demonstrate the superiority of our ACMN and visualization results show the explainable capability of our reasoning system.
Tasks	Question Answering, Visual Question Answering
Published	2018-03-31
URL	http://arxiv.org/abs/1804.00105v1
PDF	http://arxiv.org/pdf/1804.00105v1.pdf
PWC	https://paperswithcode.com/paper/visual-question-reasoning-on-general
Repo
Framework

Multi-View Frame Reconstruction with Conditional GAN


Title	Multi-View Frame Reconstruction with Conditional GAN
Authors	Tahmida Mahmud, Mohammad Billah, Amit K. Roy-Chowdhury
Abstract	Multi-view frame reconstruction is an important problem particularly when multiple frames are missing and past and future frames within the camera are far apart from the missing ones. Realistic coherent frames can still be reconstructed using corresponding frames from other overlapping cameras. We propose an adversarial approach to learn the spatio-temporal representation of the missing frame using conditional Generative Adversarial Network (cGAN). The conditional input to each cGAN is the preceding or following frames within the camera or the corresponding frames in other overlapping cameras, all of which are merged together using a weighted average. Representations learned from frames within the camera are given more weight compared to the ones learned from other cameras when they are close to the missing frames and vice versa. Experiments on two challenging datasets demonstrate that our framework produces comparable results with the state-of-the-art reconstruction method in a single camera and achieves promising performance in multi-camera scenario.
Tasks
Published	2018-09-27
URL	http://arxiv.org/abs/1809.10352v1
PDF	http://arxiv.org/pdf/1809.10352v1.pdf
PWC	https://paperswithcode.com/paper/multi-view-frame-reconstruction-with
Repo
Framework

Category Trees


Title	Category Trees
Authors	Kieran Greer
Abstract	This paper presents a batch classifier that has been improved from the earlier version and fixed a mistake in the earlier paper. Two important changes have been made. Each category is represented by a classifier, where each classifier classifies its own subset of data rows, using batch input values to represent the centroid. The first change is to use the category centroid as the desired category output. When the classifier represents more than one category, it creates a new layer and splits, to represent each category separately in the new layer. The second change therefore, is to allow the classifier to branch to new levels when there is a split in the data, or when some data rows are incorrectly classified. Each layer can therefore branch like a tree - not for distinguishing features, but for distinguishing categories. The paper then suggests further innovations, by adding fixed value ranges through bands, for each column or feature of the input dataset. When considering features, it is shown that some of the data can be classified directly through fixed value ranges, while the rest can be classified using the classifier technique. Tests show that the method can successfully classify a diverse set of benchmark datasets to better than the state-of-the-art. The paper also discusses a biological analogy with neurons and neuron links.
Tasks
Published	2018-11-06
URL	https://arxiv.org/abs/1811.02617v5
PDF	https://arxiv.org/pdf/1811.02617v5.pdf
PWC	https://paperswithcode.com/paper/an-improved-batch-classifier-with-bands-and
Repo
Framework

Use Of Vapnik-Chervonenkis Dimension in Model Selection


Title	Use Of Vapnik-Chervonenkis Dimension in Model Selection
Authors	Merlin Mpoudeu
Abstract	In this dissertation, I derive a new method to estimate the Vapnik-Chervonenkis Dimension (VCD) for the class of linear functions. This method is inspired by the technique developed by Vapnik et al. Vapnik et al. (1994). My contribution rests on the approximation of the expected maximum difference between two empirical Losses (EMDBTEL). In fact, I use a cross-validated form of the error to compute the EMDBTEL, and I make the bound on the EMDBTEL tighter by minimizing a constant in of its right upper bound. I also derive two bounds for the true unknown risk using the additive (ERM1) and the multiplicative (ERM2) Chernoff bounds. These bounds depend on the estimated VCD and the empirical risk. These bounds can be used to perform model selection and to declare with high probability, the chosen model will perform better without making strong assumptions about the data generating process (DG). I measure the accuracy of my technique on simulated datasets and also on three real datasets. The model selection provided by VCD was always as good as if not better than the other methods under reasonable conditions.
Tasks	Model Selection
Published	2018-08-20
URL	http://arxiv.org/abs/1808.06684v1
PDF	http://arxiv.org/pdf/1808.06684v1.pdf
PWC	https://paperswithcode.com/paper/use-of-vapnik-chervonenkis-dimension-in-model
Repo
Framework

Translating Questions into Answers using DBPedia n-triples


Title	Translating Questions into Answers using DBPedia n-triples
Authors	Mihael Arcan
Abstract	In this paper we present a question answering system using a neural network to interpret questions learned from the DBpedia repository. We train a sequence-to-sequence neural network model with n-triples extracted from the DBpedia Infobox Properties. Since these properties do not represent the natural language, we further used question-answer dialogues from movie subtitles. Although the automatic evaluation shows a low overlap of the generated answers compared to the gold standard set, a manual inspection of the showed promising outcomes from the experiment for further work.
Tasks	Question Answering
Published	2018-03-07
URL	http://arxiv.org/abs/1803.02914v1
PDF	http://arxiv.org/pdf/1803.02914v1.pdf
PWC	https://paperswithcode.com/paper/translating-questions-into-answers-using
Repo
Framework

Textually Enriched Neural Module Networks for Visual Question Answering


Title	Textually Enriched Neural Module Networks for Visual Question Answering
Authors	Khyathi Raghavi Chandu, Mary Arpita Pyreddy, Matthieu Felix, Narendra Nath Joshi
Abstract	Problems at the intersection of language and vision, like visual question answering, have recently been gaining a lot of attention in the field of multi-modal machine learning as computer vision research moves beyond traditional recognition tasks. There has been recent success in visual question answering using deep neural network models which use the linguistic structure of the questions to dynamically instantiate network layouts. In the process of converting the question to a network layout, the question is simplified, which results in loss of information in the model. In this paper, we enrich the image information with textual data using image captions and external knowledge bases to generate more coherent answers. We achieve 57.1% overall accuracy on the test-dev open-ended questions from the visual question answering (VQA 1.0) real image dataset.
Tasks	Image Captioning, Question Answering, Visual Question Answering
Published	2018-09-23
URL	http://arxiv.org/abs/1809.08697v1
PDF	http://arxiv.org/pdf/1809.08697v1.pdf
PWC	https://paperswithcode.com/paper/textually-enriched-neural-module-networks-for
Repo
Framework

MSCE: An edge preserving robust loss function for improving super-resolution algorithms


Title	MSCE: An edge preserving robust loss function for improving super-resolution algorithms
Authors	Ram Krishna Pandey, Nabagata Saha, Samarjit Karmakar, A G Ramakrishnan
Abstract	With the recent advancement in the deep learning technologies such as CNNs and GANs, there is significant improvement in the quality of the images reconstructed by deep learning based super-resolution (SR) techniques. In this work, we propose a robust loss function based on the preservation of edges obtained by the Canny operator. This loss function, when combined with the existing loss function such as mean square error (MSE), gives better SR reconstruction measured in terms of PSNR and SSIM. Our proposed loss function guarantees improved performance on any existing algorithm using MSE loss function, without any increase in the computational complexity during testing.
Tasks	Super-Resolution
Published	2018-08-25
URL	http://arxiv.org/abs/1809.00961v1
PDF	http://arxiv.org/pdf/1809.00961v1.pdf
PWC	https://paperswithcode.com/paper/msce-an-edge-preserving-robust-loss-function
Repo
Framework

Image-Based Reconstruction for a 3D-PFHS Heat Transfer Problem by ReConNN


Title	Image-Based Reconstruction for a 3D-PFHS Heat Transfer Problem by ReConNN
Authors	Yu Li, Hu Wang, Xinjian Deng
Abstract	The heat transfer performance of Plate Fin Heat Sink (PFHS) has been investigated experimentally and extensively. Commonly, the objective function of the PFHS design is based on the responses of simulations. Compared with existing studies, the purpose of this study is to transfer from analysis-based model to image-based one for heat sink designs. Compared with the popular objective function based on maximum, mean, variance values etc., more information should be involved in image-based and thus a more objective model should be constructed. It means that the sequential optimization should be based on images instead of responses and more reasonable solutions should be obtained. Therefore, an image-based reconstruction model of a heat transfer process for a 3D-PFHS is established. Unlike image recognition, such procedure cannot be implemented by existing recognition algorithms (e.g. Convolutional Neural Network) directly. Therefore, a Reconstructive Neural Network (ReConNN), integrated supervised learning and unsupervised learning techniques, is suggested and improved to achieve higher accuracy. According to the experimental results, the heat transfer process can be observed more detailed and clearly, and the reconstructed results are meaningful for the further optimizations.
Tasks
Published	2018-11-06
URL	http://arxiv.org/abs/1811.02102v2
PDF	http://arxiv.org/pdf/1811.02102v2.pdf
PWC	https://paperswithcode.com/paper/image-based-reconstruction-for-a-3d-pfhs-heat
Repo
Framework

Discontinuity-Sensitive Optimal Control Learning by Mixture of Experts


Title	Discontinuity-Sensitive Optimal Control Learning by Mixture of Experts
Authors	Gao Tang, Kris Hauser
Abstract	This paper proposes a discontinuity-sensitive approach to learn the solutions of parametric optimal control problems with high accuracy. Many tasks, ranging from model predictive control to reinforcement learning, may be solved by learning optimal solutions as a function of problem parameters. However, nonconvexity, discrete homotopy classes, and control switching cause discontinuity in the parameter-solution mapping, thus making learning difficult for traditional continuous function approximators. A mixture of experts (MoE) model composed of a classifier and several regressors is proposed to address such an issue. The optimal trajectories of different parameters are clustered such that in each cluster the trajectories are continuous function of problem parameters. Numerical examples on benchmark problems show that training the classifier and regressors individually outperforms joint training of MoE. With suitably chosen clusters, this approach not only achieves lower prediction error with less training data and fewer model parameters, but also leads to dramatic improvements in the reliability of trajectory tracking compared to traditional universal function approximation models (e.g., neural networks).
Tasks
Published	2018-03-07
URL	https://arxiv.org/abs/1803.02493v2
PDF	https://arxiv.org/pdf/1803.02493v2.pdf
PWC	https://paperswithcode.com/paper/discontinuity-sensitive-optimal-control
Repo
Framework

Reduction of the Pareto Set in Bicriteria Asymmetric Traveling Salesman Problem


Title	Reduction of the Pareto Set in Bicriteria Asymmetric Traveling Salesman Problem
Authors	Aleksey O. Zakharov, Yulia V. Kovalenko
Abstract	We consider the bicriteria asymmetric traveling salesman problem (bi-ATSP). Optimal solution to a multicriteria problem is usually supposed to be the Pareto set, which is rather wide in real-world problems. We apply to the bi-ATSP the axiomatic approach of the Pareto set reduction proposed by V. Noghin. We identify series of “quanta of information” that guarantee the reduction of the Pareto set for particular cases of the bi-ATSP. An approximation of the Pareto set to the bi-ATSP is constructed by a new multi-objective genetic algorithm. The experimental evaluation carried out in this paper shows the degree of reduction of the Pareto set approximation for various “quanta of information” and various structures of the bi-ATSP instances generated randomly.
Tasks
Published	2018-05-27
URL	http://arxiv.org/abs/1805.10606v1
PDF	http://arxiv.org/pdf/1805.10606v1.pdf
PWC	https://paperswithcode.com/paper/reduction-of-the-pareto-set-in-bicriteria
Repo
Framework

Initialize globally before acting locally: Enabling Landmark-free 3D US to MRI Registration


Title	Initialize globally before acting locally: Enabling Landmark-free 3D US to MRI Registration
Authors	Julia Rackerseder, Maximilian Baust, Rüdiger Göbl, Nassir Navab, Christoph Hennersperger
Abstract	Registration of partial-view 3D US volumes with MRI data is influenced by initialization. The standard of practice is using extrinsic or intrinsic landmarks, which can be very tedious to obtain. To overcome the limitations of registration initialization, we present a novel approach that is based on Euclidean distance maps derived from easily obtainable coarse segmentations. We evaluate our approach quantitatively on the publicly available RESECT dataset and show that it is robust regarding overlap of target area and initial position. Furthermore, our method provides initializations that are suitable for state-of-the-art nonlinear, deformable image registration algorithm’s capture ranges.
Tasks	Image Registration
Published	2018-06-12
URL	http://arxiv.org/abs/1806.04368v1
PDF	http://arxiv.org/pdf/1806.04368v1.pdf
PWC	https://paperswithcode.com/paper/initialize-globally-before-acting-locally
Repo
Framework

Improved and Robust Controversy Detection in General Web Pages Using Semantic Approaches under Large Scale Conditions


Title	Improved and Robust Controversy Detection in General Web Pages Using Semantic Approaches under Large Scale Conditions
Authors	Jasper Linmans, Bob van de Velde, Evangelos Kanoulas
Abstract	Detecting controversy in general web pages is a daunting task, but increasingly essential to efficiently moderate discussions and effectively filter problematic content. Unfortunately, controversies occur across many topics and domains, with great changes over time. This paper investigates neural classifiers as a more robust methodology for controversy detection in general web pages. Current models have often cast controversy detection on general web pages as Wikipedia linking, or exact lexical matching tasks. The diverse and changing nature of controversies suggest that semantic approaches are better able to detect controversy. We train neural networks that can capture semantic information from texts using weak signal data. By leveraging the semantic properties of word embeddings we robustly improve on existing controversy detection methods. To evaluate model stability over time and to unseen topics, we asses model performance under varying training conditions to test cross-temporal, cross-topic, cross-domain performance and annotator congruence. In doing so, we demonstrate that weak-signal based neural approaches are closer to human estimates of controversy and are more robust to the inherent variability of controversies.
Tasks	Word Embeddings
Published	2018-12-02
URL	http://arxiv.org/abs/1812.00382v1
PDF	http://arxiv.org/pdf/1812.00382v1.pdf
PWC	https://paperswithcode.com/paper/improved-and-robust-controversy-detection-in
Repo
Framework