January 26, 2020

3069 words 15 mins read

Paper Group ANR 1382

Depth-Preserving Real-Time Arbitrary Style Transfer. Low-rank Kernel Learning for Graph-based Clustering. Residual Objectness for Imbalance Reduction. TAPA-MVS: Textureless-Aware PAtchMatch Multi-View Stereo. Meeting Transcription Using Virtual Microphone Arrays. Towards Precise Robotic Grasping by Probabilistic Post-grasp Displacement Estimation. …

Depth-Preserving Real-Time Arbitrary Style Transfer


Title	Depth-Preserving Real-Time Arbitrary Style Transfer
Authors	Konstantin Kozlovtsev, Victor Kitov
Abstract	Style transfer is the process of rendering one image with some content in the style of another image, representing the style. Recent studies of Liu et al. (2017) have shown significant improvement of style transfer rendering quality by adjusting traditional methods of Gatys et al. (2016) and Johnson et al. (2016) with regularizer, forcing preservation of the depth map of the content image. However these traditional methods are either computationally inefficient or require training a separate neural network for new style. AdaIN method of Huang et al. (2017) allows efficient transferring of arbitrary style without training a separate model but is not able to reproduce the depth map of the content image. We propose an extension to this method, allowing depth map preservation. Qualitative analysis and results of user evaluation study indicate that the proposed method provides better stylizations, compared to the original style transfer methods of Gatys et al. (2016) and Huang et al. (2017).
Tasks	Style Transfer
Published	2019-06-03
URL	https://arxiv.org/abs/1906.01123v1
PDF	https://arxiv.org/pdf/1906.01123v1.pdf
PWC	https://paperswithcode.com/paper/depth-preserving-real-time-arbitrary-style
Repo
Framework

Low-rank Kernel Learning for Graph-based Clustering


Title	Low-rank Kernel Learning for Graph-based Clustering
Authors	Zhao Kang, Liangjian Wen, Wenyu Chen, Zenglin Xu
Abstract	Constructing the adjacency graph is fundamental to graph-based clustering. Graph learning in kernel space has shown impressive performance on a number of benchmark data sets. However, its performance is largely determined by the chosen kernel matrix. To address this issue, the previous multiple kernel learning algorithm has been applied to learn an optimal kernel from a group of predefined kernels. This approach might be sensitive to noise and limits the representation ability of the consensus kernel. In contrast to existing methods, we propose to learn a low-rank kernel matrix which exploits the similarity nature of the kernel matrix and seeks an optimal kernel from the neighborhood of candidate kernels. By formulating graph construction and kernel learning in a unified framework, the graph and consensus kernel can be iteratively enhanced by each other. Extensive experimental results validate the efficacy of the proposed method.
Tasks	graph construction
Published	2019-03-14
URL	http://arxiv.org/abs/1903.05962v1
PDF	http://arxiv.org/pdf/1903.05962v1.pdf
PWC	https://paperswithcode.com/paper/low-rank-kernel-learning-for-graph-based
Repo
Framework

Residual Objectness for Imbalance Reduction


Title	Residual Objectness for Imbalance Reduction
Authors	Joya Chen, Dong Liu, Bin Luo, Xuezheng Peng, Tong Xu, Enhong Chen
Abstract	For a long time, object detectors have suffered from extreme imbalance between foregrounds and backgrounds. While several sampling/reweighting schemes have been explored to alleviate the imbalance, they are usually heuristic and demand laborious hyper-parameters tuning, which is hard to achieve the optimality. In this paper, we first reveal that such the imbalance could be addressed in a learning-based manner. Guided by this illuminating observation, we propose a novel Residual Objectness (ResObj) mechanism that addresses the imbalance by end-to-end optimization, while no further hand-crafted sampling/reweighting is required. Specifically, by applying multiple cascaded objectness-related modules with residual connections, we formulate an elegant consecutive refinement procedure for distinguishing the foregrounds from backgrounds, thereby progressively addressing the imbalance. Extensive experiments present the effectiveness of our method, as well as its compatibility and adaptivity for both region-based and one-stage detectors, namely, the RetinaNet-ResObj, YOLOv3-ResObj and FasterRCNN-ResObj achieve relative 3.6%, 3.9%, 3.2% Average Precision (AP) improvements compared with their vanilla models on COCO, respectively.
Tasks
Published	2019-08-24
URL	https://arxiv.org/abs/1908.09075v1
PDF	https://arxiv.org/pdf/1908.09075v1.pdf
PWC	https://paperswithcode.com/paper/residual-objectness-for-imbalance-reduction
Repo
Framework

TAPA-MVS: Textureless-Aware PAtchMatch Multi-View Stereo


Title	TAPA-MVS: Textureless-Aware PAtchMatch Multi-View Stereo
Authors	Andrea Romanoni, Matteo Matteucci
Abstract	One of the most successful approaches in Multi-View Stereo estimates a depth map and a normal map for each view via PatchMatch-based optimization and fuses them into a consistent 3D points cloud. This approach relies on photo-consistency to evaluate the goodness of a depth estimate. It generally produces very accurate results; however, the reconstructed model often lacks completeness, especially in correspondence of broad untextured areas where the photo-consistency metrics are unreliable. Assuming the untextured areas piecewise planar, in this paper we generate novel PatchMatch hypotheses so to expand reliable depth estimates in neighboring untextured regions. At the same time, we modify the photo-consistency measure such to favor standard or novel PatchMatch depth hypotheses depending on the textureness of the considered area. We also propose a depth refinement step to filter wrong estimates and to fill the gaps on both the depth maps and normal maps while preserving the discontinuities. The effectiveness of our new methods has been tested against several state of the art algorithms in the publicly available ETH3D dataset containing a wide variety of high and low-resolution images.
Tasks
Published	2019-03-26
URL	http://arxiv.org/abs/1903.10929v1
PDF	http://arxiv.org/pdf/1903.10929v1.pdf
PWC	https://paperswithcode.com/paper/tapa-mvs-textureless-aware-patchmatch-multi
Repo
Framework

Meeting Transcription Using Virtual Microphone Arrays


Title	Meeting Transcription Using Virtual Microphone Arrays
Authors	Takuya Yoshioka, Zhuo Chen, Dimitrios Dimitriadis, William Hinthorn, Xuedong Huang, Andreas Stolcke, Michael Zeng
Abstract	We describe a system that generates speaker-annotated transcripts of meetings by using a virtual microphone array, a set of spatially distributed asynchronous recording devices such as laptops and mobile phones. The system is composed of continuous audio stream alignment, blind beamforming, speech recognition, speaker diarization using prior speaker information, and system combination. When utilizing seven input audio streams, our system achieves a word error rate (WER) of 22.3% and comes within 3% of the close-talking microphone WER on the non-overlapping speech segments. The speaker-attributed WER (SAWER) is 26.7%. The relative gains in SAWER over the single-device system are 14.8%, 20.3%, and 22.4% for three, five, and seven microphones, respectively. The presented system achieves a 13.6% diarization error rate when 10% of the speech duration contains more than one speaker. The contribution of each component to the overall performance is also investigated, and we validate the system with experiments on the NIST RT-07 conference meeting test set.
Tasks	Speaker Diarization, Speech Recognition
Published	2019-05-03
URL	https://arxiv.org/abs/1905.02545v2
PDF	https://arxiv.org/pdf/1905.02545v2.pdf
PWC	https://paperswithcode.com/paper/meeting-transcription-using-virtual
Repo
Framework

Towards Precise Robotic Grasping by Probabilistic Post-grasp Displacement Estimation


Title	Towards Precise Robotic Grasping by Probabilistic Post-grasp Displacement Estimation
Authors	Jialiang Zhao, Jacky Liang, Oliver Kroemer
Abstract	Precise robotic grasping is important for many industrial applications, such as assembly and palletizing, where the location of the object needs to be controlled and known. However, achieving precise grasps is challenging due to noise in sensing and control, as well as unknown object properties. We propose a method to plan robotic grasps that are both robust and precise by training two convolutional neural networks - one to predict the robustness of a grasp and another to predict a distribution of post-grasp object displacements. Our networks are trained with depth images in simulation on a dataset of over 1000 industrial parts and were successfully deployed on a real robot without having to be further fine-tuned. The proposed displacement estimator achieves a mean prediction errors of 0.68cm and 3.42deg on novel objects in real world experiments.
Tasks	Robotic Grasping
Published	2019-09-04
URL	https://arxiv.org/abs/1909.02129v1
PDF	https://arxiv.org/pdf/1909.02129v1.pdf
PWC	https://paperswithcode.com/paper/towards-precise-robotic-grasping-by
Repo
Framework

Real-time Tracking-by-Detection of Human Motion in RGB-D Camera Networks


Title	Real-time Tracking-by-Detection of Human Motion in RGB-D Camera Networks
Authors	Alessandro Malaguti, Marco Carraro, Mattia Guidolin, Luca Tagliapietra, Emanuele Menegatti, Stefano Ghidoni
Abstract	This paper presents a novel real-time tracking system capable of improving body pose estimation algorithms in distributed camera networks. The first stage of our approach introduces a linear Kalman filter operating at the body joints level, used to fuse single-view body poses coming from different detection nodes of the network and to ensure temporal consistency between them. The second stage, instead, refines the Kalman filter estimates by fitting a hierarchical model of the human body having constrained link sizes in order to ensure the physical consistency of the tracking. The effectiveness of the proposed approach is demonstrated through a broad experimental validation, performed on a set of sequences whose ground truth references are generated by a commercial marker-based motion capture system. The obtained results show how the proposed system outperforms the considered state-of-the-art approaches, granting accurate and reliable estimates. Moreover, the developed methodology constrains neither the number of persons to track, nor the number, position, synchronization, frame-rate, and manufacturer of the RGB-D cameras used. Finally, the real-time performances of the system are of paramount importance for a large number of real-world applications.
Tasks	Motion Capture, Pose Estimation
Published	2019-07-28
URL	https://arxiv.org/abs/1907.12112v1
PDF	https://arxiv.org/pdf/1907.12112v1.pdf
PWC	https://paperswithcode.com/paper/real-time-tracking-by-detection-of-human
Repo
Framework

Randomization as Regularization: A Degrees of Freedom Explanation for Random Forest Success


Title	Randomization as Regularization: A Degrees of Freedom Explanation for Random Forest Success
Authors	Lucas Mentch, Siyu Zhou
Abstract	Random forests remain among the most popular off-the-shelf supervised machine learning tools with a well-established track record of predictive accuracy in both regression and classification settings. Despite their empirical success as well as a bevy of recent work investigating their statistical properties, a full and satisfying explanation for their success has yet to be put forth. Here we aim to take a step forward in this direction by demonstrating that the additional randomness injected into individual trees serves as a form of implicit regularization, making random forests an ideal model in low signal-to-noise ratio (SNR) settings. Specifically, from a model-complexity perspective, we show that the mtry parameter in random forests serves much the same purpose as the shrinkage penalty in explicitly regularized regression procedures like lasso and ridge regression. To highlight this point, we design a randomized linear-model-based forward selection procedure intended as an analogue to tree-based random forests and demonstrate its surprisingly strong empirical performance. Numerous demonstrations on both real and synthetic data are provided.
Tasks
Published	2019-11-01
URL	https://arxiv.org/abs/1911.00190v1
PDF	https://arxiv.org/pdf/1911.00190v1.pdf
PWC	https://paperswithcode.com/paper/randomization-as-regularization-a-degrees-of
Repo
Framework

Evolutionary Clustering via Message Passing


Title	Evolutionary Clustering via Message Passing
Authors	Natalia M. Arzeno, Haris Vikalo
Abstract	We are often interested in clustering objects that evolve over time and identifying solutions to the clustering problem for every time step. Evolutionary clustering provides insight into cluster evolution and temporal changes in cluster memberships while enabling performance superior to that achieved by independently clustering data collected at different time points. In this paper we introduce evolutionary affinity propagation (EAP), an evolutionary clustering algorithm that groups data points by exchanging messages on a factor graph. EAP promotes temporal smoothness of the solution to clustering time-evolving data by linking the nodes of the factor graph that are associated with adjacent data snapshots, and introduces consensus nodes to enable cluster tracking and identification of cluster births and deaths. Unlike existing evolutionary clustering methods that require additional processing to approximate the number of clusters or match them across time, EAP determines the number of clusters and tracks them automatically. A comparison with existing methods on simulated and experimental data demonstrates effectiveness of the proposed EAP algorithm.
Tasks
Published	2019-12-27
URL	https://arxiv.org/abs/1912.11970v1
PDF	https://arxiv.org/pdf/1912.11970v1.pdf
PWC	https://paperswithcode.com/paper/evolutionary-clustering-via-message-passing
Repo
Framework

A Voice Interactive Multilingual Student Support System using IBM Watson


Title	A Voice Interactive Multilingual Student Support System using IBM Watson
Authors	Kennedy Ralston, Yuhao Chen, Haruna Isah, Farhana Zulkernine
Abstract	Systems powered by artificial intelligence are being developed to be more user-friendly by communicating with users in a progressively human-like conversational way. Chatbots, also known as dialogue systems, interactive conversational agents, or virtual agents are an example of such systems used in a wide variety of applications ranging from customer support in the business domain to companionship in the healthcare sector. It is becoming increasingly important to develop chatbots that can best respond to the personalized needs of their users so that they can be as helpful to the user as possible in a real human way. This paper investigates and compares three popular existing chatbots API offerings and then propose and develop a voice interactive and multilingual chatbot that can effectively respond to users mood, tone, and language using IBM Watson Assistant, Tone Analyzer, and Language Translator. The chatbot was evaluated using a use case that was targeted at responding to users needs regarding exam stress based on university students survey data generated using Google Forms. The results of measuring the chatbot effectiveness at analyzing responses regarding exam stress indicate that the chatbot responding appropriately to the user queries regarding how they are feeling about exams 76.5%. The chatbot could also be adapted for use in other application areas such as student info-centers, government kiosks, and mental health support systems.
Tasks	Chatbot
Published	2019-12-20
URL	https://arxiv.org/abs/2001.00471v1
PDF	https://arxiv.org/pdf/2001.00471v1.pdf
PWC	https://paperswithcode.com/paper/a-voice-interactive-multilingual-student
Repo
Framework

All-neural online source separation, counting, and diarization for meeting analysis


Title	All-neural online source separation, counting, and diarization for meeting analysis
Authors	Thilo von Neumann, Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro Nakatani, Reinhold Haeb-Umbach
Abstract	Automatic meeting analysis comprises the tasks of speaker counting, speaker diarization, and the separation of overlapped speech, followed by automatic speech recognition. This all has to be carried out on arbitrarily long sessions and, ideally, in an online or block-online manner. While significant progress has been made on individual tasks, this paper presents for the first time an all-neural approach to simultaneous speaker counting, diarization and source separation. The NN-based estimator operates in a block-online fashion and tracks speakers even if they remain silent for a number of time blocks, thus learning a stable output order for the separated sources. The neural network is recurrent over time as well as over the number of sources. The simulation experiments show that state of the art separation performance is achieved, while at the same time delivering good diarization and source counting results. It even generalizes well to an unseen large number of blocks.
Tasks	Speaker Diarization, Speech Recognition
Published	2019-02-21
URL	http://arxiv.org/abs/1902.07881v1
PDF	http://arxiv.org/pdf/1902.07881v1.pdf
PWC	https://paperswithcode.com/paper/all-neural-online-source-separation-counting
Repo
Framework

Comparing Apples and Oranges: Measuring Differences between Exploratory Data Mining Results


Title	Comparing Apples and Oranges: Measuring Differences between Exploratory Data Mining Results
Authors	Nikolaj Tatti, Jilles Vreeken
Abstract	Deciding whether the results of two different mining algorithms provide significantly different information is an important, yet understudied, open problem in exploratory data mining. Whether the goal is to select the most informative result for analysis, or to decide which mining approach will most likely provide the most novel insight, it is essential that we can tell how different the information is that different results by possibly different methods provide. In this paper we take a first step towards comparing exploratory data mining results on binary data. We propose to meaningfully convert results into sets of noisy tiles, and compare between these sets by Maximum Entropy modelling and Kullback-Leibler divergence, well-founded notions from Information Theory. We so construct a measure that is highly flexible, and allows us to naturally include background knowledge, such that differences in results can be measured from the perspective of what a user already knows. Furthermore, adding to its interpretability, it coincides with Jaccard dissimilarity when we only consider exact tiles. Our approach provides a means to study and tell differences between results of different exploratory data mining methods. As an application, we show that our measure can also be used to identify which parts of results best redescribe other results. Furthermore, we study its use for iterative data mining, where one iteratively wants to find that result that will provide maximal novel information. Experimental evaluation shows our measure gives meaningful results, correctly identifies methods that are similar in nature, automatically provides sound redescriptions of results, and is highly applicable for iterative data mining.
Tasks
Published	2019-02-18
URL	http://arxiv.org/abs/1902.07165v2
PDF	http://arxiv.org/pdf/1902.07165v2.pdf
PWC	https://paperswithcode.com/paper/comparing-apples-and-oranges-measuring
Repo
Framework

Transfer in Deep Reinforcement Learning using Knowledge Graphs


Title	Transfer in Deep Reinforcement Learning using Knowledge Graphs
Authors	Prithviraj Ammanabrolu, Mark O. Riedl
Abstract	Text adventure games, in which players must make sense of the world through text descriptions and declare actions through text descriptions, provide a stepping stone toward grounding action in language. Prior work has demonstrated that using a knowledge graph as a state representation and question-answering to pre-train a deep Q-network facilitates faster control policy transfer. In this paper, we explore the use of knowledge graphs as a representation for domain knowledge transfer for training text-adventure playing reinforcement learning agents. Our methods are tested across multiple computer generated and human authored games, varying in domain and complexity, and demonstrate that our transfer learning methods let us learn a higher-quality control policy faster.
Tasks	Knowledge Graphs, Question Answering, Transfer Learning
Published	2019-08-19
URL	https://arxiv.org/abs/1908.06556v1
PDF	https://arxiv.org/pdf/1908.06556v1.pdf
PWC	https://paperswithcode.com/paper/transfer-in-deep-reinforcement-learning-using-2
Repo
Framework

Small-footprint Keyword Spotting with Graph Convolutional Network


Title	Small-footprint Keyword Spotting with Graph Convolutional Network
Authors	Xi Chen, Shouyi Yin, Dandan Song, Peng Ouyang, Leibo Liu, Shaojun Wei
Abstract	Despite the recent successes of deep neural networks, it remains challenging to achieve high precision keyword spotting task (KWS) on resource-constrained devices. In this study, we propose a novel context-aware and compact architecture for keyword spotting task. Based on residual connection and bottleneck structure, we design a compact and efficient network for KWS task. To leverage the long range dependencies and global context of the convolutional feature maps, the graph convolutional network is introduced to encode the non-local relations. By evaluated on the Google Speech Command Dataset, the proposed method achieves state-of-the-art performance and outperforms the prior works by a large margin with lower computational cost.
Tasks	Keyword Spotting, Small-Footprint Keyword Spotting
Published	2019-12-11
URL	https://arxiv.org/abs/1912.05124v1
PDF	https://arxiv.org/pdf/1912.05124v1.pdf
PWC	https://paperswithcode.com/paper/small-footprint-keyword-spotting-with-graph
Repo
Framework

Pixel-Attentive Policy Gradient for Multi-Fingered Grasping in Cluttered Scenes


Title	Pixel-Attentive Policy Gradient for Multi-Fingered Grasping in Cluttered Scenes
Authors	Bohan Wu, Iretiayo Akinola, Peter K. Allen
Abstract	Recent advances in on-policy reinforcement learning (RL) methods enabled learning agents in virtual environments to master complex tasks with high-dimensional and continuous observation and action spaces. However, leveraging this family of algorithms in multi-fingered robotic grasping remains a challenge due to large sim-to-real fidelity gaps and the high sample complexity of on-policy RL algorithms. This work aims to bridge these gaps by first reinforcement-learning a multi-fingered robotic grasping policy in simulation that operates in the pixel space of the input: a single depth image. Using a mapping from pixel space to Cartesian space according to the depth map, this method transfers to the real world with high fidelity and introduces a novel attention mechanism that substantially improves grasp success rate in cluttered environments. Finally, the direct-generative nature of this method allows learning of multi-fingered grasps that have flexible end-effector positions, orientations and rotations, as well as all degrees of freedom of the hand.
Tasks	Robotic Grasping
Published	2019-03-08
URL	https://arxiv.org/abs/1903.03227v4
PDF	https://arxiv.org/pdf/1903.03227v4.pdf
PWC	https://paperswithcode.com/paper/pixel-attentive-policy-gradient-for-multi
Repo
Framework