January 27, 2020

3100 words 15 mins read

Paper Group ANR 1285

Discriminative training of conditional random fields with probably submodular constraints. Deep-Learning Assisted High-Resolution Binocular Stereo Depth Reconstruction. DR-KFS: A Differentiable Visual Similarity Metric for 3D Shape Reconstruction. Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret. Statistical Learning and …

Discriminative training of conditional random fields with probably submodular constraints


Title	Discriminative training of conditional random fields with probably submodular constraints
Authors	Maxim Berman, Matthew B. Blaschko
Abstract	Problems of segmentation, denoising, registration and 3D reconstruction are often addressed with the graph cut algorithm. However, solving an unconstrained graph cut problem is NP-hard. For tractable optimization, pairwise potentials have to fulfill the submodularity inequality. In our learning paradigm, pairwise potentials are created as the dot product of a learned vector w with positive feature vectors. In order to constrain such a model to remain tractable, previous approaches have enforced the weight vector to be positive for pairwise potentials in which the labels differ, and set pairwise potentials to zero in the case that the label remains the same. Such constraints are sufficient to guarantee that the resulting pairwise potentials satisfy the submodularity inequality. However, we show that such an approach unnecessarily restricts the capacity of the learned models. Guaranteeing submodularity for all possible inputs, no matter how improbable, reduces inference error to effectively zero, but increases model error. In contrast, we relax the requirement of guaranteed submodularity to solutions that are probably approximately submodular. We show that the conceptually simple strategy of enforcing submodularity on the training examples guarantees with low sample complexity that test images will also yield submodular pairwise potentials. Results are presented in the binary and muticlass settings, showing substantial improvement from the resulting increased model capacity.
Tasks	3D Reconstruction, Denoising
Published	2019-11-25
URL	https://arxiv.org/abs/1911.10819v1
PDF	https://arxiv.org/pdf/1911.10819v1.pdf
PWC	https://paperswithcode.com/paper/discriminative-training-of-conditional-random
Repo
Framework

Deep-Learning Assisted High-Resolution Binocular Stereo Depth Reconstruction


Title	Deep-Learning Assisted High-Resolution Binocular Stereo Depth Reconstruction
Authors	Yaoyu Hu, Weikun Zhen, Sebastian Scherer
Abstract	This work presents dense stereo reconstruction using high-resolution images for infrastructure inspections. The state-of-the-art stereo reconstruction methods, both learning and non-learning ones, consume too much computational resource on high-resolution data. Recent learning-based methods achieve top ranks on most benchmarks. However, they suffer from the generalization issue due to lack of task-specific training data. We propose to use a less resource demanding non-learning method, guided by a learning-based model, to handle high-resolution images and achieve accurate stereo reconstruction. The deep-learning model produces an initial disparity prediction with uncertainty for each pixel of the down-sampled stereo image pair. The uncertainty serves as a self-measurement of its generalization ability and the per-pixel searching range around the initially predicted disparity. The downstream process performs a modified version of the Semi-Global Block Matching method with the up-sampled per-pixel searching range. The proposed deep-learning assisted method is evaluated on the Middlebury dataset and high-resolution stereo images collected by our customized binocular stereo camera. The combination of learning and non-learning methods achieves better performance on 12 out of 15 cases of the Middlebury dataset. In our infrastructure inspection experiments, the average 3D reconstruction error is less than 0.004m.
Tasks	3D Reconstruction
Published	2019-11-23
URL	https://arxiv.org/abs/1912.05012v2
PDF	https://arxiv.org/pdf/1912.05012v2.pdf
PWC	https://paperswithcode.com/paper/deep-learning-assisted-high-resolution
Repo
Framework

DR-KFS: A Differentiable Visual Similarity Metric for 3D Shape Reconstruction


Title	DR-KFS: A Differentiable Visual Similarity Metric for 3D Shape Reconstruction
Authors	Jiongchao Jin, Akshay Gadi Patil, Zhang Xiong, Hao Zhang
Abstract	We introduce a differential visual similarity metric to train deep neural networks for 3D reconstruction, aimed at improving reconstruction quality. The metric compares two 3D shapes by measuring distances between multi-view images differentiably rendered from the shapes. Importantly, the image-space distance is also differentiable and measures visual similarity, rather than pixel-wise distortion. Specifically, the similarity is defined by mean-squared errors over HardNet features computed from probabilistic keypoint maps of the compared images. Our differential visual shape similarity metric can be easily plugged into various 3D reconstruction networks, replacing their distortion-based losses, such as Chamfer or Earth Mover distances, so as to optimize the network weights to produce reconstructions with better structural fidelity and visual quality. We demonstrate this both objectively, using well-known shape metrics for retrieval and classification tasks that are independent from our new metric, and subjectively through a perceptual study.
Tasks	3D Reconstruction
Published	2019-11-20
URL	https://arxiv.org/abs/1911.09204v4
PDF	https://arxiv.org/pdf/1911.09204v4.pdf
PWC	https://paperswithcode.com/paper/dr-kfd-a-differentiable-visual-metric-for-3d
Repo
Framework

Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret


Title	Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret
Authors	Alon Cohen, Tomer Koren, Yishay Mansour
Abstract	We present the first computationally-efficient algorithm with $\widetilde O(\sqrt{T})$ regret for learning in Linear Quadratic Control systems with unknown dynamics. By that, we resolve an open question of Abbasi-Yadkori and Szepesv'ari (2011) and Dean, Mania, Matni, Recht, and Tu (2018).
Tasks
Published	2019-02-17
URL	http://arxiv.org/abs/1902.06223v2
PDF	http://arxiv.org/pdf/1902.06223v2.pdf
PWC	https://paperswithcode.com/paper/learning-linear-quadratic-regulators
Repo
Framework

Statistical Learning and Estimation of Piano Fingering


Title	Statistical Learning and Estimation of Piano Fingering
Authors	Eita Nakamura, Yasuyuki Saito, Kazuyoshi Yoshii
Abstract	Automatic estimation of piano fingering is important for understanding the computational process of music performance and applicable to performance assistance and education systems. While a natural way to formulate the quality of fingerings is to construct models of the constraints/costs of performance, it is generally difficult to find appropriate parameter values for these models. Here we study an alternative data-driven approach based on statistical modeling in which the appropriateness of a given fingering is described by probabilities. Specifically, we construct two types of hidden Markov models (HMMs) and their higher-order extensions. We also study deep neural network (DNN)-based methods for comparison. Using a newly released dataset of fingering annotations, we conduct systematic evaluations of these models as well as a representative constraint-based method. We find that the methods based on high-order HMMs outperform the other methods in terms of estimation accuracies. We also quantitatively study individual difference of fingering and propose evaluation measures that can be used with multiple ground truth data. We conclude that the HMM-based methods are currently state of the art and generate acceptable fingerings in most parts and that they have certain limitations such as ignorance of phrase boundaries and interdependence of the two hands.
Tasks
Published	2019-04-23
URL	https://arxiv.org/abs/1904.10237v2
PDF	https://arxiv.org/pdf/1904.10237v2.pdf
PWC	https://paperswithcode.com/paper/statistical-learning-and-estimation-of-piano
Repo
Framework

Incorporating Textual Evidence in Visual Storytelling


Title	Incorporating Textual Evidence in Visual Storytelling
Authors	Tianyi Li, Sujian Li
Abstract	Previous work on visual storytelling mainly focused on exploring image sequence as evidence for storytelling and neglected textual evidence for guiding story generation. Motivated by human storytelling process which recalls stories for familiar images, we exploit textual evidence from similar images to help generate coherent and meaningful stories. To pick the images which may provide textual experience, we propose a two-step ranking method based on image object recognition techniques. To utilize textual information, we design an extended Seq2Seq model with two-channel encoder and attention. Experiments on the VIST dataset show that our method outperforms state-of-the-art baseline models without heavy engineering.
Tasks	Object Recognition, Visual Storytelling
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09334v1
PDF	https://arxiv.org/pdf/1911.09334v1.pdf
PWC	https://paperswithcode.com/paper/incorporating-textual-evidence-in-visual
Repo
Framework

Keep it Consistent: Topic-Aware Storytelling from an Image Stream via Iterative Multi-agent Communication


Title	Keep it Consistent: Topic-Aware Storytelling from an Image Stream via Iterative Multi-agent Communication
Authors	Ruize Wang, Zhongyu Wei, Piji Li, Haijun Shan, Ji Zhang, Qi Zhang, Xuanjing Huang
Abstract	Visual storytelling aims to generate a narrative paragraph from a sequence of images automatically. Existing approaches construct text description independently for each image and roughly concatenate them as a story, which leads to the problem of generating semantically incoherent content. In this paper, we proposed a new way for visual storytelling by introducing a topic description task to detect the global semantic context of an image stream. A story is then constructed with the guidance of the topic description. In order to combine the two generation tasks, we propose a multi-agent communication framework that regards the topic description generator and the story generator as two agents and learn them simultaneously via iterative updating mechanism. We validate our approach on VIST, where quantitative results, ablations, and human evaluation demonstrate our method’s good ability in generating stories with higher quality compared to state-of-the-art methods.
Tasks	Visual Storytelling
Published	2019-11-11
URL	https://arxiv.org/abs/1911.04192v1
PDF	https://arxiv.org/pdf/1911.04192v1.pdf
PWC	https://paperswithcode.com/paper/keep-it-consistent-topic-aware-storytelling
Repo
Framework

Causal Discovery from Heterogeneous/Nonstationary Data


Title	Causal Discovery from Heterogeneous/Nonstationary Data
Authors	Biwei Huang, Kun Zhang, Jiji Zhang, Joseph Ramsey, Ruben Sanchez-Romero, Clark Glymour, Bernhard Schölkopf
Abstract	It is commonplace to encounter heterogeneous or nonstationary data, of which the underlying generating process changes across domains or over time. Such a distribution shift feature presents both challenges and opportunities for causal discovery. In this paper, we develop a framework for causal discovery from such data, called Constraint-based causal Discovery from heterogeneous/NOnstationary Data (CD-NOD), to find causal skeleton and directions and estimate the properties of mechanism changes. First, we propose an enhanced constraint-based procedure to detect variables whose local mechanisms change and recover the skeleton of the causal structure over observed variables. Second, we present a method to determine causal orientations by making use of independent changes in the data distribution implied by the underlying causal model, benefiting from information carried by changing distributions. After learning the causal structure, next, we investigate how to efficiently estimate the `driving force’ of the nonstationarity of a causal mechanism. That is, we aim to extract from data a low-dimensional representation of changes. The proposed methods are nonparametric, with no hard restrictions on data distributions and causal mechanisms, and do not rely on window segmentation. Furthermore, we find that data heterogeneity benefits causal structure identification even with particular types of confounders. Finally, we show the connection between heterogeneity/nonstationarity and soft intervention in causal discovery. Experimental results on various synthetic and real-world data sets (task-fMRI and stock market data) are presented to demonstrate the efficacy of the proposed methods. \|
Tasks	Causal Discovery
Published	2019-03-05
URL	https://arxiv.org/abs/1903.01672v3
PDF	https://arxiv.org/pdf/1903.01672v3.pdf
PWC	https://paperswithcode.com/paper/causal-discovery-and-hidden-driving-force
Repo
Framework

High-Dimensional Stochastic Gradient Quantization for Communication-Efficient Edge Learning


Title	High-Dimensional Stochastic Gradient Quantization for Communication-Efficient Edge Learning
Authors	Yuqing Du, Sheng Yang, Kaibin Huang
Abstract	Edge machine learning involves the deployment of learning algorithms at the wireless network edge so as to leverage massive mobile data for enabling intelligent applications. The mainstream edge learning approach, federated learning, has been developed based on distributed gradient descent. Based on the approach, stochastic gradients are computed at edge devices and then transmitted to an edge server for updating a global AI model. Since each stochastic gradient is typically high-dimensional (with millions to billions of coefficients), communication overhead becomes a bottleneck for edge learning. To address this issue, we propose in this work a novel framework of hierarchical stochastic gradient quantization and study its effect on the learning performance. First, the framework features a practical hierarchical architecture for decomposing the stochastic gradient into its norm and normalized block gradients, and efficiently quantizes them using a uniform quantizer and a low-dimensional codebook on a Grassmann manifold, respectively. Subsequently, the quantized normalized block gradients are scaled and cascaded to yield the quantized normalized stochastic gradient using a so-called hinge vector designed under the criterion of minimum distortion. The hinge vector is also efficiently compressed using another low-dimensional Grassmannian quantizer. The other feature of the framework is a bit-allocation scheme for reducing the quantization error. The scheme determines the resolutions of the low-dimensional quantizers in the proposed framework. The framework is proved to guarantee model convergency by analyzing the convergence rate as a function of the quantization bits. Furthermore, by simulation, our design is shown to substantially reduce the communication overhead compared with the state-of-the-art signSGD scheme, while both achieve similar learning accuracies.
Tasks	Quantization
Published	2019-10-09
URL	https://arxiv.org/abs/1910.03865v1
PDF	https://arxiv.org/pdf/1910.03865v1.pdf
PWC	https://paperswithcode.com/paper/high-dimensional-stochastic-gradient
Repo
Framework

Deep learning-based color holographic microscopy


Title	Deep learning-based color holographic microscopy
Authors	Tairan Liu, Zhensong Wei, Yair Rivenson, Kevin de Haan, Yibo Zhang, Yichen Wu, Aydogan Ozcan
Abstract	We report a framework based on a generative adversarial network (GAN) that performs high-fidelity color image reconstruction using a single hologram of a sample that is illuminated simultaneously by light at three different wavelengths. The trained network learns to eliminate missing-phase-related artifacts, and generates an accurate color transformation for the reconstructed image. Our framework is experimentally demonstrated using lung and prostate tissue sections that are labeled with different histological stains. This framework is envisaged to be applicable to point-of-care histopathology, and presents a significant improvement in the throughput of coherent microscopy systems given that only a single hologram of the specimen is required for accurate color imaging.
Tasks	Image Reconstruction
Published	2019-07-15
URL	https://arxiv.org/abs/1907.06727v1
PDF	https://arxiv.org/pdf/1907.06727v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-based-color-holographic
Repo
Framework

Long-range Event-level Prediction and Response Simulation for Urban Crime and Global Terrorism with Granger Networks


Title	Long-range Event-level Prediction and Response Simulation for Urban Crime and Global Terrorism with Granger Networks
Authors	Timmy Li, Yi Huang, James Evans, Ishanu Chattopadhyay
Abstract	Large-scale trends in urban crime and global terrorism are well-predicted by socio-economic drivers, but focused, event-level predictions have had limited success. Standard machine learning approaches are promising, but lack interpretability, are generally interpolative, and ineffective for precise future interventions with costly and wasteful false positives. Here, we are introducing Granger Network inference as a new forecasting approach for individual infractions with demonstrated performance far surpassing past results, yet transparent enough to validate and extend social theory. Considering the problem of predicting crime in the City of Chicago, we achieve an average AUC of ~90% for events predicted a week in advance within spatial tiles approximately $1000$ ft across. Instead of pre-supposing that crimes unfold across contiguous spaces akin to diffusive systems, we learn the local transport rules from data. As our key insights, we uncover indications of suburban bias – how law-enforcement response is modulated by socio-economic contexts with disproportionately negative impacts in the inner city – and how the dynamics of violent and property crimes co-evolve and constrain each other – lending quantitative support to controversial pro-active policing policies. To demonstrate broad applicability to spatio-temporal phenomena, we analyze terror attacks in the middle-east in the recent past, and achieve an AUC of ~80% for predictions made a week in advance, and within spatial tiles measuring approximately 120 miles across. We conclude that while crime operates near an equilibrium quickly dissipating perturbations, terrorism does not. Indeed terrorism aims to destabilize social order, as shown by its dynamics being susceptible to run-away increases in event rates under small perturbations.
Tasks
Published	2019-11-04
URL	https://arxiv.org/abs/1911.05647v1
PDF	https://arxiv.org/pdf/1911.05647v1.pdf
PWC	https://paperswithcode.com/paper/long-range-event-level-prediction-and
Repo
Framework


Title	Recommending Related Tables
Authors	Shuo Zhang, Krisztian Balog
Abstract	Tables are an extremely powerful visual and interactive tool for structuring and manipulating data, making spreadsheet programs one of the most popular computer applications. In this paper we introduce and address the task of recommending related tables: given an input table, identifying and returning a ranked list of relevant tables. One of the many possible application scenarios for this task is to provide users of a spreadsheet program proactively with recommendations for related structured content on the Web. At its core, the related table recommendation task boils down to computing the similarity between a pair of tables. We develop a theoretically sound framework for performing table matching. Our approach hinges on the idea of representing table elements in multiple semantic spaces, and then combining element-level similarities using a discriminative learning model. Using a purpose-built test collection from Wikipedia tables, we demonstrate that the proposed approach delivers state-of-the-art performance.
Tasks
Published	2019-07-08
URL	https://arxiv.org/abs/1907.03595v2
PDF	https://arxiv.org/pdf/1907.03595v2.pdf
PWC	https://paperswithcode.com/paper/recommending-related-tables
Repo
Framework

Towards computer vision powered color-nutrient assessment of pureed food


Title	Towards computer vision powered color-nutrient assessment of pureed food
Authors	Kaylen J. Pfisterer, Robert Amelard, Braeden Syrnyk, Alexander Wong
Abstract	With one in four individuals afflicted with malnutrition, computer vision may provide a way of introducing a new level of automation in the nutrition field to reliably monitor food and nutrient intake. In this study, we present a novel approach to modeling the link between color and vitamin A content using transmittance imaging of a pureed foods dilution series in a computer vision powered nutrient sensing system via a fine-tuned deep autoencoder network, which in this case was trained to predict the relative concentration of sweet potato purees. Experimental results show the deep autoencoder network can achieve an accuracy of 80% across beginner (6 month) and intermediate (8 month) commercially prepared pureed sweet potato samples. Prediction errors may be explained by fundamental differences in optical properties which are further discussed.
Tasks
Published	2019-05-01
URL	http://arxiv.org/abs/1905.00310v1
PDF	http://arxiv.org/pdf/1905.00310v1.pdf
PWC	https://paperswithcode.com/paper/towards-computer-vision-powered-color
Repo
Framework

Robust estimation of tree structured Gaussian Graphical Model


Title	Robust estimation of tree structured Gaussian Graphical Model
Authors	Ashish Katiyar, Jessica Hoffmann, Constantine Caramanis
Abstract	Consider jointly Gaussian random variables whose conditional independence structure is specified by a graphical model. If we observe realizations of the variables, we can compute the covariance matrix, and it is well known that the support of the inverse covariance matrix corresponds to the edges of the graphical model. Instead, suppose we only have noisy observations. If the noise at each node is independent, we can compute the sum of the covariance matrix and an unknown diagonal. The inverse of this sum is (in general) dense. We ask: can the original independence structure be recovered? We address this question for tree structured graphical models. We prove that this problem is unidentifiable, but show that this unidentifiability is limited to a small class of candidate trees. We further present additional constraints under which the problem is identifiable. Finally, we provide an O(n^3) algorithm to find this equivalence class of trees.
Tasks
Published	2019-01-25
URL	http://arxiv.org/abs/1901.08770v1
PDF	http://arxiv.org/pdf/1901.08770v1.pdf
PWC	https://paperswithcode.com/paper/robust-estimation-of-tree-structured-gaussian
Repo
Framework

LIT: Light-field Inference of Transparency for Refractive Object Localization


Title	LIT: Light-field Inference of Transparency for Refractive Object Localization
Authors	Zheming Zhou, Xiaotong Chen, Odest Chadwicke Jenkins
Abstract	Translucency is prevalent in everyday scenes. As such, perception of transparent objects is essential for robots to perform manipulation. Compared with texture-rich or texture-less Lambertian objects, transparency induces significant uncertainty on object appearances. Ambiguity can be due to changes in lighting, viewpoint, and backgrounds, each of which brings challenges to existing object pose estimation algorithms. In this work, we propose LIT, a two-stage method for transparent object pose estimation using light-field sensing and photorealistic rendering. LIT employs multiple filters specific to light-field imagery in deep networks to capture transparent material properties, with robust depth and pose estimators based on generative sampling. Along with the LIT algorithm, we introduce the light-field transparent object dataset ProLIT for the tasks of recognition, localization and pose estimation. With respect to this ProLIT dataset, we demonstrate that LIT can outperform both state-of-the-art end-to-end pose estimation methods and a generative pose estimator on transparent objects.
Tasks	Object Localization, Pose Estimation
Published	2019-10-02
URL	https://arxiv.org/abs/1910.00721v4
PDF	https://arxiv.org/pdf/1910.00721v4.pdf
PWC	https://paperswithcode.com/paper/lite-light-field-transparency-estimation-for
Repo
Framework