Vision LLM

2023

Purpose:

VisionLLM is a novel-framework building project. We are experimenting with combining Computer Vision (CV) models with Large Language Models (LLMs), with the power of embedded data and vector stores, to build a robust and versatile vision and language navigation (VLN) Application. The focus is currently on a VLN assistant for blind and visually impaired people, but the general framework can be applied to navigation for anybody or anything, especially robotic systems where increasing intractability with users and versatility in capabilities is key

Here is the link to github repo.

I focused on implementing multi-modal embeddings, which enabled our team to experiment with various data types, including structured data, images, text, and audio.

Previous
Previous

Robot Control, Wall Following, Obstacle Avoidance and Person Following

Next
Next

Data Science: Marine Plastic Pollution