Vision-based navigation with language-based assistance