January 25, 2020

2978 words 14 mins read

Paper Group ANR 1649

Building Dynamic Knowledge Graphs from Text-based Games. Drishtikon: An advanced navigational aid system for visually impaired people. Unsupervised routine discovery in egocentric photo-streams. Action Recognition Using Volumetric Motion Representations. Coloring the Black Box: Visualizing neural network behavior with a self-introspective model. Op …

Building Dynamic Knowledge Graphs from Text-based Games


Title	Building Dynamic Knowledge Graphs from Text-based Games
Authors	Mikuláš Zelinka, Xingdi Yuan, Marc-Alexandre Côté, Romain Laroche, Adam Trischler
Abstract	We are interested in learning how to update Knowledge Graphs (KG) from text. In this preliminary work, we propose a novel Sequence-to-Sequence (Seq2Seq) architecture to generate elementary KG operations. Furthermore, we introduce a new dataset for KG extraction built upon text-based game transitions (over 300k data points). We conduct experiments and discuss the results.
Tasks	Knowledge Graphs
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09532v3
PDF	https://arxiv.org/pdf/1910.09532v3.pdf
PWC	https://paperswithcode.com/paper/building-dynamic-knowledge-graphs-from-text-2
Repo
Framework

Drishtikon: An advanced navigational aid system for visually impaired people


Title	Drishtikon: An advanced navigational aid system for visually impaired people
Authors	Shashank Kotyan, Nishant Kumar, Pankaj Kumar Sahu, Venkanna Udutalapally
Abstract	Today, many of the aid systems deployed for visually impaired people are mostly made for a single purpose. Be it navigation, object detection, or distance perceiving. Also, most of the deployed aid systems use indoor navigation which requires a pre-knowledge of the environment. These aid systems often fail to help visually impaired people in the unfamiliar scenario. In this paper, we propose an aid system developed using object detection and depth perceivement to navigate a person without dashing into an object. The prototype developed detects 90 different types of objects and compute their distances from the user. We also, implemented a navigation feature to get input from the user about the target destination and hence, navigate the impaired person to his/her destination using Google Directions API. With this system, we built a multi-feature, high accuracy navigational aid system which can be deployed in the wild and help the visually impaired people in their daily life by navigating them effortlessly to their desired destination.
Tasks	Object Detection
Published	2019-04-23
URL	http://arxiv.org/abs/1904.10351v1
PDF	http://arxiv.org/pdf/1904.10351v1.pdf
PWC	https://paperswithcode.com/paper/drishtikon-an-advanced-navigational-aid
Repo
Framework

Unsupervised routine discovery in egocentric photo-streams


Title	Unsupervised routine discovery in egocentric photo-streams
Authors	Estefania Talavera, Nicolai Petkov, Petia Radeva
Abstract	The routine of a person is defined by the occurrence of activities throughout different days, and can directly affect the person’s health. In this work, we address the recognition of routine related days. To do so, we rely on egocentric images, which are recorded by a wearable camera and allow to monitor the life of the user from a first-person view perspective. We propose an unsupervised model that identifies routine related days, following an outlier detection approach. We test the proposed framework over a total of 72 days in the form of photo-streams covering around 2 weeks of the life of 5 different camera wearers. Our model achieves an average of 76% Accuracy and 68% Weighted F-Score for all the users. Thus, we show that our framework is able to recognise routine related days and opens the door to the understanding of the behaviour of people.
Tasks	Outlier Detection
Published	2019-05-10
URL	https://arxiv.org/abs/1905.04076v1
PDF	https://arxiv.org/pdf/1905.04076v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-routine-discovery-in-egocentric
Repo
Framework

Action Recognition Using Volumetric Motion Representations


Title	Action Recognition Using Volumetric Motion Representations
Authors	Michael Peven, Gregory D. Hager, Austin Reiter
Abstract	Traditional action recognition models are constructed around the paradigm of 2D perspective imagery. Though sophisticated time-series models have pushed the field forward, much of the information is still not exploited by confining the domain to 2D. In this work, we introduce a novel representation of motion as a voxelized 3D vector field and demonstrate how it can be used to improve performance of action recognition networks. This volumetric representation is a natural fit for 3D CNNs, and allows out-of-plane data augmentation techniques during training of these networks. Both the construction of this representation from RGB-D video and inference can be run in real time. We demonstrate superior results using this representation with our network design on the open-source NTU RGB+D dataset where it outperforms state-of-the-art on both of the defined evaluation metrics. Furthermore, we experimentally show how the out-of-plane augmentation techniques create viewpoint invariance and allow the model trained using this representation to generalize to unseen camera angles. Code is available here: https://github.com/mpeven/ntu_rgb.
Tasks	Data Augmentation, Time Series
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08511v1
PDF	https://arxiv.org/pdf/1911.08511v1.pdf
PWC	https://paperswithcode.com/paper/action-recognition-using-volumetric-motion
Repo
Framework

Coloring the Black Box: Visualizing neural network behavior with a self-introspective model


Title	Coloring the Black Box: Visualizing neural network behavior with a self-introspective model
Authors	Arturo Pardo, José A. Gutiérrez-Gutiérrez, José Miguel López-Higuera, Brian W. Pogue, Olga M. Conde
Abstract	The following work presents how autoencoding all the possible hidden activations of a network for a given problem can provide insight about its structure, behavior, and vulnerabilities. The method, termed self-introspection, can show that a trained model showcases similar activation patterns (albeit randomly distributed due to initialization) when shown data belonging to the same category, and classification errors occur in fringe areas where the activations are not as clearly defined, suggesting some form of random, slowly varying, implicit encoding occurring within deep networks, that can be observed with this representation. Additionally, obtaining a low-dimensional representation of all the activations allows for (1) real-time model evaluation in the context of a multiclass classification problem, (2) the rearrangement of all hidden layers by their relevance in obtaining a specific output, and (3) the obtainment of a framework where studying possible counter-measures to noise and adversarial attacks is possible. Self-introspection can show how damaged input data can modify the hidden activations, producing an erroneous response. A few illustrative are implemented for feedforward and convolutional models and the MNIST and CIFAR-10 datasets, showcasing its capabilities as a model evaluation framework.
Tasks
Published	2019-10-10
URL	https://arxiv.org/abs/1910.04903v2
PDF	https://arxiv.org/pdf/1910.04903v2.pdf
PWC	https://paperswithcode.com/paper/coloring-the-black-box-visualizing-neural
Repo
Framework

Optimistic Regret Minimization for Extensive-Form Games via Dilated Distance-Generating Functions


Title	Optimistic Regret Minimization for Extensive-Form Games via Dilated Distance-Generating Functions
Authors	Gabriele Farina, Christian Kroer, Tuomas Sandholm
Abstract	We study the performance of optimistic regret-minimization algorithms for both minimizing regret in, and computing Nash equilibria of, zero-sum extensive-form games. In order to apply these algorithms to extensive-form games, a distance-generating function is needed. We study the use of the dilated entropy and dilated Euclidean distance functions. For the dilated Euclidean distance function we prove the first explicit bounds on the strong-convexity parameter for general treeplexes. Furthermore, we show that the use of dilated distance-generating functions enable us to decompose the mirror descent algorithm, and its optimistic variant, into local mirror descent algorithms at each information set. This decomposition mirrors the structure of the counterfactual regret minimization framework, and enables important techniques in practice, such as distributed updates and pruning of cold parts of the game tree. Our algorithms provably converge at a rate of $T^{-1}$, which is superior to prior counterfactual regret minimization algorithms. We experimentally compare to the popular algorithm CFR+, which has a theoretical convergence rate of $T^{-0.5}$ in theory, but is known to often converge at a rate of $T^{-1}$, or better, in practice. We give an example matrix game where CFR+ experimentally converges at a relatively slow rate of $T^{-0.74}$, whereas our optimistic methods converge faster than $T^{-1}$. We go on to show that our fast rate also holds in the Kuhn poker game, which is an extensive-form game. For games with deeper game trees however, we find that CFR+ is still faster. Finally we show that when the goal is minimizing regret, rather than computing a Nash equilibrium, our optimistic methods can outperform CFR+, even in deep game trees.
Tasks
Published	2019-10-24
URL	https://arxiv.org/abs/1910.10906v2
PDF	https://arxiv.org/pdf/1910.10906v2.pdf
PWC	https://paperswithcode.com/paper/optimistic-regret-minimization-for-extensive
Repo
Framework

Hierarchical Reinforcement Learning Method for Autonomous Vehicle Behavior Planning


Title	Hierarchical Reinforcement Learning Method for Autonomous Vehicle Behavior Planning
Authors	Zhiqian Qiao, Zachariah Tyree, Priyantha Mudalige, Jeff Schneider, John M. Dolan
Abstract	In this work, we propose a hierarchical reinforcement learning (HRL) structure which is capable of performing autonomous vehicle planning tasks in simulated environments with multiple sub-goals. In this hierarchical structure, the network is capable of 1) learning one task with multiple sub-goals simultaneously; 2) extracting attentions of states according to changing sub-goals during the learning process; 3) reusing the well-trained network of sub-goals for other similar tasks with the same sub-goals. The states are defined as processed observations which are transmitted from the perception system of the autonomous vehicle. A hybrid reward mechanism is designed for different hierarchical layers in the proposed HRL structure. Compared to traditional RL methods, our algorithm is more sample-efficient since its modular design allows reusing the policies of sub-goals across similar tasks. The results show that the proposed method converges to an optimal policy faster than traditional RL methods.
Tasks	Hierarchical Reinforcement Learning
Published	2019-11-09
URL	https://arxiv.org/abs/1911.03799v1
PDF	https://arxiv.org/pdf/1911.03799v1.pdf
PWC	https://paperswithcode.com/paper/hierarchical-reinforcement-learning-method
Repo
Framework

Domain Discrepancy Measure for Complex Models in Unsupervised Domain Adaptation


Title	Domain Discrepancy Measure for Complex Models in Unsupervised Domain Adaptation
Authors	Jongyeong Lee, Nontawat Charoenphakdee, Seiichi Kuroki, Masashi Sugiyama
Abstract	Appropriately evaluating the discrepancy between domains is essential for the success of unsupervised domain adaptation. In this paper, we first point out that existing discrepancy measures are less informative when complex models such as deep neural networks are used, in addition to the facts that they can be computationally highly demanding and their range of applications is limited only to binary classification. We then propose a novel domain discrepancy measure, called the paired hypotheses discrepancy (PHD), to overcome these shortcomings. PHD is computationally efficient and applicable to multi-class classification. Through generalization error bound analysis, we theoretically show that PHD is effective even for complex models. Finally, we demonstrate the practical usefulness of PHD through experiments.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2019-01-30
URL	https://arxiv.org/abs/1901.10654v3
PDF	https://arxiv.org/pdf/1901.10654v3.pdf
PWC	https://paperswithcode.com/paper/domain-discrepancy-measure-using-complex
Repo
Framework

Eliminating all bad Local Minima from Loss Landscapes without even adding an Extra Unit


Title	Eliminating all bad Local Minima from Loss Landscapes without even adding an Extra Unit
Authors	Jascha Sohl-Dickstein, Kenji Kawaguchi
Abstract	Recent work has noted that all bad local minima can be removed from neural network loss landscapes, by adding a single unit with a particular parameterization. We show that the core technique from these papers can be used to remove all bad local minima from any loss landscape, so long as the global minimum has a loss of zero. This procedure does not require the addition of auxiliary units, or even that the loss be associated with a neural network. The method of action involves all bad local minima being converted into bad (non-local) minima at infinity in terms of auxiliary parameters.
Tasks
Published	2019-01-12
URL	http://arxiv.org/abs/1901.03909v1
PDF	http://arxiv.org/pdf/1901.03909v1.pdf
PWC	https://paperswithcode.com/paper/eliminating-all-bad-local-minima-from-loss
Repo
Framework

PitchNet: Unsupervised Singing Voice Conversion with Pitch Adversarial Network


Title	PitchNet: Unsupervised Singing Voice Conversion with Pitch Adversarial Network
Authors	Chengqi Deng, Chengzhu Yu, Heng Lu, Chao Weng, Dong Yu
Abstract	Singing voice conversion is to convert a singer’s voice to another one’s voice without changing singing content. Recent work shows that unsupervised singing voice conversion can be achieved with an autoencoder-based approach [1]. However, the converted singing voice can be easily out of key, showing that the existing approach cannot model the pitch information precisely. In this paper, we propose to advance the existing unsupervised singing voice conversion method proposed in [1] to achieve more accurate pitch translation and flexible pitch manipulation. Specifically, the proposed PitchNet added an adversarially trained pitch regression network to enforce the encoder network to learn pitch invariant phoneme representation, and a separate module to feed pitch extracted from the source audio to the decoder network. Our evaluation shows that the proposed method can greatly improve the quality of the converted singing voice (2.92 vs 3.75 in MOS). We also demonstrate that the pitch of converted singing can be easily controlled during generation by changing the levels of the extracted pitch before passing it to the decoder network.
Tasks	Voice Conversion
Published	2019-12-04
URL	https://arxiv.org/abs/1912.01852v2
PDF	https://arxiv.org/pdf/1912.01852v2.pdf
PWC	https://paperswithcode.com/paper/pitchnet-unsupervised-singing-voice
Repo
Framework

Robot Learning and Execution of Collaborative Manipulation Plans from YouTube Cooking Videos


Title	Robot Learning and Execution of Collaborative Manipulation Plans from YouTube Cooking Videos
Authors	Hejia Zhang, Stefanos Nikolaidis
Abstract	People often watch videos on the web to learn how to cook new recipes, assemble furniture or repair a computer. We wish to enable robots with the very same capability. This is challenging; there is a large variation in manipulation actions and some videos even involve multiple persons, who collaborate by sharing and exchanging objects and tools. Furthermore, the learned representations need to be general enough to be transferable to robotic systems. On the other hand, previous work has shown that the space of human manipulation actions has a linguistic, hierarchical structure that relates actions to manipulated objects and tools. Building upon this theory of language for action, we propose a framework for understanding and executing demonstrated action sequences from full-length, unconstrained cooking videos on the web. The framework takes as input a cooking video annotated with object labels and bounding boxes, and outputs a collaborative manipulation action plan for one or more robotic arms. We demonstrate performance of the system in a standardized dataset of 100 YouTube cooking videos, as well as in three full-length Youtube videos that include collaborative actions between two participants. We additionally propose an open-source platform for executing the learned plans in a simulation environment as well as with an actual robotic arm.
Tasks
Published	2019-11-25
URL	https://arxiv.org/abs/1911.10686v3
PDF	https://arxiv.org/pdf/1911.10686v3.pdf
PWC	https://paperswithcode.com/paper/robot-learning-and-execution-of-collaborative
Repo
Framework

Conditional Transferring Features: Scaling GANs to Thousands of Classes with 30% Less High-quality Data for Training


Title	Conditional Transferring Features: Scaling GANs to Thousands of Classes with 30% Less High-quality Data for Training
Authors	Chunpeng Wu, Wei Wen, Yiran Chen, Hai Li
Abstract	Generative adversarial network (GAN) has greatly improved the quality of unsupervised image generation. Previous GAN-based methods often require a large amount of high-quality training data while producing a small number (e.g., tens) of classes. This work aims to scale up GANs to thousands of classes meanwhile reducing the use of high-quality data in training. We propose an image generation method based on conditional transferring features, which can capture pixel-level semantic changes when transforming low-quality images into high-quality ones. Moreover, self-supervision learning is integrated into our GAN architecture to provide more label-free semantic supervisory information observed from the training data. As such, training our GAN architecture requires much fewer high-quality images with a small number of additional low-quality images. The experiments on CIFAR-10 and STL-10 show that even removing 30% high-quality images from the training set, our method can still outperform previous ones. The scalability on object classes has been experimentally validated: our method with 30% fewer high-quality images obtains the best quality in generating 1,000 ImageNet classes, as well as generating all 3,755 classes of CASIA-HWDB1.0 Chinese handwriting characters.
Tasks	Image Generation
Published	2019-09-25
URL	https://arxiv.org/abs/1909.11308v1
PDF	https://arxiv.org/pdf/1909.11308v1.pdf
PWC	https://paperswithcode.com/paper/conditional-transferring-features-scaling
Repo
Framework

Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational Autoencoders


Title	Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational Autoencoders
Authors	Yin-Jyun Luo, Chin-Chen Hsu, Kat Agres, Dorien Herremans
Abstract	We propose a flexible framework that deals with both singer conversion and singers vocal technique conversion. The proposed model is trained on non-parallel corpora, accommodates many-to-many conversion, and leverages recent advances of variational autoencoders. It employs separate encoders to learn disentangled latent representations of singer identity and vocal technique separately, with a joint decoder for reconstruction. Conversion is carried out by simple vector arithmetic in the learned latent spaces. Both a quantitative analysis as well as a visualization of the converted spectrograms show that our model is able to disentangle singer identity and vocal technique and successfully perform conversion of these attributes. To the best of our knowledge, this is the first work to jointly tackle conversion of singer identity and vocal technique based on a deep learning approach.
Tasks	Voice Conversion
Published	2019-12-03
URL	https://arxiv.org/abs/1912.02613v3
PDF	https://arxiv.org/pdf/1912.02613v3.pdf
PWC	https://paperswithcode.com/paper/singing-voice-conversion-with-disentangled
Repo
Framework

Distributed Iterative Gating Networks for Semantic Segmentation


Title	Distributed Iterative Gating Networks for Semantic Segmentation
Authors	Rezaul Karim, Md Amirul Islam, Neil D. B. Bruce
Abstract	In this paper, we present a canonical structure for controlling information flow in neural networks with an efficient feedback routing mechanism based on a strategy of Distributed Iterative Gating (DIGNet). The structure of this mechanism derives from a strong conceptual foundation and presents a light-weight mechanism for adaptive control of computation similar to recurrent convolutional neural networks by integrating feedback signals with a feed-forward architecture. In contrast to other RNN formulations, DIGNet generates feedback signals in a cascaded manner that implicitly carries information from all the layers above. This cascaded feedback propagation by means of the propagator gates is found to be more effective compared to other feedback mechanisms that use feedback from the output of either the corresponding stage or from the previous stage. Experiments reveal the high degree of capability that this recurrent approach with cascaded feedback presents over feed-forward baselines and other recurrent models for pixel-wise labeling problems on three challenging datasets, PASCAL VOC 2012, COCO-Stuff, and ADE20K.
Tasks	Semantic Segmentation
Published	2019-09-28
URL	https://arxiv.org/abs/1909.12996v1
PDF	https://arxiv.org/pdf/1909.12996v1.pdf
PWC	https://paperswithcode.com/paper/distributed-iterative-gating-networks-for
Repo
Framework

Reducing Noise in GAN Training with Variance Reduced Extragradient


Title	Reducing Noise in GAN Training with Variance Reduced Extragradient
Authors	Tatjana Chavdarova, Gauthier Gidel, François Fleuret, Simon Lacoste-Julien
Abstract	We study the effect of the stochastic gradient noise on the training of generative adversarial networks (GANs) and show that it can prevent the convergence of standard game optimization methods, while the batch version converges. We address this issue with a novel stochastic variance-reduced extragradient (SVRE) optimization algorithm that improves upon the best convergence rates proposed in the literature. We observe empirically that SVRE performs similarly to a batch method on MNIST while being computationally cheaper, and that SVRE yields more stable GAN training on standard datasets.
Tasks
Published	2019-04-18
URL	https://arxiv.org/abs/1904.08598v2
PDF	https://arxiv.org/pdf/1904.08598v2.pdf
PWC	https://paperswithcode.com/paper/reducing-noise-in-gan-training-with-variance
Repo
Framework