An attempt of AI + Krpano

  • Hi everyone,

    Happy New Year.

    Demo
    For Chinese without VPN (same content)

    This is an attempt to integrate AI into krpano. Image recognition technology based on AI is used to analyze what the user sees in the panorama.
    You can ask all kinds of questions, such as what objects are there, what is it, what color, how many, etc.

    AI sees what you see.
    I made some restrictions, such as the content length of the response, etc. The main reason is to avoid incurring too many bills.*rolleyes*

    tips:You don't have to use English. You can also communicate with it in other languages.

  • Hi, thanks for sharing the example. In my opinion it would be very interesting if you could direct the camera to the subject of the question, for example if you say "wooden door" the camera would turn towards that door.

    There is indeed AI that can do this and can add bounding boxes to the identified objects. If you're interested in doing your own research, you can check out Google Cloud Vision and Microsoft Azure Computer Vision.
    I don't think this is very interesting though, if there was time to type it, it would have been done quickly by mouse.
    Usually this is only suitable for special projects, where objects need to be annotated.

  • Thanks for pointing it out, it's been fixed.
    The original logic, which answers questions based only on the image, doesn't think of the image as where the user is.

    But not all questions can be answered, such as what is the size of the object, how much is the object worth, etc. Nor can AI know.
    As for the name of the plant, this would probably require domain-specific AI to recognize. It's just a general AI, not a botanist.

  • I wouldn't limit myself to looking at a subject, but in addition to using AI to comment on a space, if it's not a photo, a text and if it's just a 360 panorama, I think it makes sense to direct the visual to the subject. Otherwise it's AI linked to any image, not to a panoramic image. That's my point.

    Note: It's just a personal opinion, I love AI and I just commented on my first impression when I saw the example. It's not a criticism in any way, just a suggestion to achieve a more interesting effect and combine AI with JS and KRPano programming even if it's at a very basic level. Something that may still be utopian.

    Edited once, last by Fernando (January 3, 2025 at 1:23 AM).

  • Yes. Everyone has different thoughts, different viewpoints and the collision of ideas.

    This was just a flash of inspiration for me, and then I spent two days making an example. I've just taken the first step.

    Although my current work focus is not on 360 panorama or krpano, I still hope to see more examples of AI integration with krpano this year.

  • it is certainy an impressive demo, congrats.

    can you explain a bit the technical details?

    when you send the question, the visible view is upload and is part of it?


    btw. it might have an use for visually impaired people,

    if the view can be described by text or even voice


    ps.

    interesting would be also the other way, to create a panorama by text.

    i did some tests with that 1-2 yrs ago but

    sd wasn't able to create a proper equirectangular image and

    the max resolution was no where near to what we would require.

  • Yes, attach the currently visible view when asking a question.

    Voice interaction is feasible, just one more step, voice to text.
    However, this increases response time and incurs additional STT and TTS costs.
    As a small experiment, I don't want to put in the cost yet.*rolleyes*

    In fact, I did another experiment in threejs to add body movements and expressions to character models. Interactions are performed via AI, matching body movements and expressions based on content feedback. It also replies with voice, and goes one step further by generating mouth motions based on speech.
    But this way, each response takes 5~10 seconds.
    Of course, it may also have something to do with the fact that I use free plan for some products. I'm stingy about doing some non-profit projects, haha.*tongue*

    create a panorama by text
    There are already some companies doing it, like SKYBOX.

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!