FMP Project Blog


Blog 1

Looking to create a VR experience going with all of the things I have worked on throughout this school year. I’ve focused on several types of VR experiences dealing with everything from bots to animals. The first two projects I’ve done were my favorites, I was able to create a coherent storyline and form interesting environments around them. My collaborative project was music-based and AI was used in the making of the video as opposed to in the VR experience. The Wudaxian project was almost absent of AI, but the mechanics were really interesting.

As I reflect on all of those, I would like to do something that draws upon the musicality, storyboarding, and VR mechanics of each project. How do I make this something cohesive and engaging?

I can look back to my old project Sidekicks — where I allowed users to speak with this holographic AI assistant. Extending this project into Unreal Engine — I could create characters users can speak to in real-time…. What would the purpose be though? I suppose it would mainly be to engage with the characters – just showing off a mechanic. I dont anticipate me developing any substantial and sticky story in the time it would take to create this project.

BLOG 2

I’ve been looking at similar projects and find that context is an important part of the interaction. Red Dead Redemption has an amazing system for interacting with NPCs. A paper on this from BAAI they stretched the limits of what can be done with AI in gaming via testing in Red Dead Redemption. In addition to being able to speak, they used AI to actively play and alter the story.

I looked at several resources on how to create these agentic elements — found that most of them are theoretical and not ready for mass adoption. Would this be relevant for my experience? I do find it interesting for the NPCs to be able to alter the world, but that wouldn’t be very different from NPCs going through a day/night cycle, right?

I have an idea that I want these bots in the world to be sentient and self-reflective in a way. Ultimately want to this to feel like an immersive world — when I think about my initial project coming into this program, it was about AI becoming sentient.

So if I think about Character A becoming sentient or interacting with elements that would help it become sentient – what would that look like? I think this is a great starting point, re-focusing on this helps me not be confused about the larger aspect of the project.

Blog 3

I went through all of my old documents and studied my thought process to come up with a direction for the project. I’m imagining a type of museum – one where you can come and interact with different NPCs.

Has to relate to Character A right ? So if I’m thinking about how this will fit in — Character A will enter this world as sort of a final level. Perhaps there’s a dream or it’s going through it’s subconscious?

Then it walks through this museum and it sees different aspects of itself, then it goes to speak with each one.

Upon speaking with each, Character A gets a key which unlocks the level to the next floor. On each floor, there will be a different representation of the character, and after getting through 7, it finally reaches the 8th which is the true mind.

When speaking with the true mind, it gets deep insights and is essentially told that it’s actually not an NPC. Then a cutscene happens and Character A realizes they’re a human.

The human aspect could potentially be done via a 360 camera footage, and have me waking up looking at my hands.

How would I integrate the AI? I’m confident about being able to successfully put it in Unreal, but what is the training data going to be?

I have all of these worlds and presentations created, so I could use these to train each bot. Dont know if it would be enough data to create distinctive personalities for seven of them. Would be interesting if I trained a small bot with these, then used AI to come up with more data, ad infinitum.

This is training a LLM on synthetic data — synthetic data generation is a growing field and can lead to more intelligent. When Ilya left OpenAI, a large reason behind it was he felt that training methods and data being used was insufficient to create AGI, let alone ASI.

So perhaps I could use the data, then increasingly use AI to hone this data and make it better for training.

Blog 4

It didn’t make sense to create such a complex system of navigation – I tried in VR doing the key and entry system, made it a bit confusing. So though it’s not complicated, it’s just a bit extra and pointless.

Needs to be a more guided journey for the user, something a bit simple that can let them focus on the AI bots.

As I looked to the training data, I saw that making it was not very complex at all. Pretty easy to consistently generate great data. Took about 4 times of maxing out Claude to make a prompt which effectively spit out better data. I turned my decks into .txt files, an all of the writing in my planning for each project as well.

I went to Google Gemini as it had the largest limits, the Claude data was honestly better, but the token generation on Gemini was superior.

Actually what was interesting — I was able to create these short form podcasts of about 15 minutes and the data from these were pretty amazing. I couldn’t use it directly, but I was able to create additional data with this.

When conversing with the first bot via chat, it had these cryptic sort of answers — this is pretty characteristic of the bot which is nice. So I can continue speaking with this bot and get interesting results. I now have the framework for training the other ones.

Inside of Unreal, I’ve been looking at the best AI implementations. I did do a project where I integrated GPT3 inside of Unreal, then used Google Text-To-Speech as a way to talk in real time.

I was even able to get the controls inside of the game such as temperature and other things.

It’s working out for now, think the Unreal piece is the next key I need to focus on — perhaps making the bots since this is first.

Blog 5

Moving from GPT 3 to GPT 4 wasn’t bad at all — initially thought that I could swap out the API key, but I had to update the architecture in Visual studio. With blueprints, I wasn’t able to do this easily. I went into Claude and came up with a way swap out the old version for the new one.

The text inside of it is working — I”m able to see it in preview mode, but for some reason the text-to-speech piece is completely broken. Worked on it for a couple of hours and sat this down.

I was able to get all of the bots trained, I did 7 of my favorite projects and they all have distinct voices. One actually talks like me which is pretty hilarious. One other talks kind of biblical, like a strange King James Version of English mixed with some Lord of The Rings dialogue.

I’ve been talking to them all day — luckily they dont break so this aspect of the project is done.

Blog 6

I’ve settled on a concept, I’m going to do the museum and instead of working through different versions of my self or projects — I will focus on the individual using this.

So when testing, I found that there’s a way to actually train the bot when using speech inside of Unreal. The text can be input, then the same text used as training data could be used to train the bot and speak with them.

This would have to happen in a different level though, because the text would go to a blueprint, then upon opening the new level, the blueprint would compile and wallah — there’s a new bot with new training data.

So this could be an experience where the user is asked a few questions in a starting level, then is transported inside of a museum where they speak with a version of themself. Perhaps just one version.

Then I could do something where I integrate posts from their Instagram to make it their own personal museum visually as well. I know that there’s an Instagram API I can use for this, then parsing the photos shouldn’t be hard at all. Wonder if I could train it off the captions, and other more wordy social media like twitter?

I think it would be easier too just do the speech to text generation, far less complex especially since I’m working with a limited amount of time.

Blog 7

The Instagram API was easy to integrate — found a cool tutorial on YouTube. I’ve tested it with my Instagram and the very first post is displaying fine, but I can’t get the rest of the posts to display. Even when I create a table, and add in 3 or 6 posts, only one is showing up. It’s a video which is pretty cool though.

So if I have it where the user enters the experience and the AI is right there in the middle, then they can speak directly with the AI while their Instagram post is behind them.

At that point, it wouldn’t be much of a museum though. The template that I’m using has several pictures and spots for exhibits. If I just have one photo with an AI – then the museum wouldn’t work. I’m attached to the museum idea because it’s very cohesive and acts as a sort of information system.

I wonder could I integrate Dall-E into this and perhaps generate images based off their instagram? I dont know if I could do that in real-time created a trained model for images takes a substantial amount of time, then user would just be doing nothing in VR.

So instead, what if I could just generate images based off of some things that they say? That seems very abstract though, because how would I create the image prompts in a cohesive way?

Nothing to focus o0n now – I really need to get the training data from the AI system implemented well. I have all of the tools for it, I’m just worried about the speed in which it’s going to be trained and able to be implemented within the experience.

Blog 8

The museum concept worked out fairly well — I used a template which is very sensical, switching out from the previous one which was a bit more of a maze, this one is much more linear. I have spots for 7 different pictures, but there’s also a central exhibit present in the middle of the museum.

I swapped out the colors and I think it’s very beautiful. Then for the training area, I did a nice skybox with colors fairly similar to that which decorate the museum. I also added in some avatars in a large space which creates this sense of wonder.

This area doesn’t need to be this interesting honestly, I could do a simple environment and just let the user focus on answering the question. I like how it is now, because it falls in line with the old experiences, I do want this to be a sort of continuation of the Character A experience.

I still haven’t figured out how this could be integrated, where would the story come in? Especially since it’s being trained off of individual data — so there could be a small cutscene before to explain things. I think that’s fine.

Now the bad part is that the training piece does not work as intended. So I tried to implement the GPT training via the speech-to-text, but it kept breaking. The bot would freeze, and when I got it to work, the response was slow and irrelevant.

I looked to integrate Anthropic into it, but that seems almost impossible. Even the Haiku model isn’t fast enough, and there aren’t any projects on Github that readily integrate the API. The Google Gemini isn’t good as well — the system seems to new and updated to frequently in order to have anything. Also, the old systems were pretty bad, so even the integration that I’ve found on YouTube, the model isn’t nearly as robust as the OpenAI version.

I’ll ultimately stick with GPT, need to find an alternative to the training piece though. This mixed with the Instagram is making me reconsider the project.

Blog 9

So I’ve re-worked the entire concept — it’s just not feasible to do the speech training system. It keeps breaking no matter what I do, I even tried with smaller model like 3.5 turbo and using only 30 seconds of training data. It ultimately still didn’t work so it can’t be integrated into the experience.

Instead, I’m going to train the data beforehand. So instead of it being a Museum of You, it will be a Museum of Me. Going back the original idea of creating different versions of me — but I can make it more interesting by focusing on the projects. Then the exhibits could contain photos or other representations of the projects I’ve worked on.

So I have all of the cover images integrated into the level with the 7 different bots each with their training data and accessible to speak in world. Since each of the levels are based on these different bots, as they’re spread throughout my art, I figure they can mirror the main characters.

Aesthetically, I think this will look pretty interesting. Want everything to be on one floor — just a linear journey where the user can speak with each individual bot.

Tried it but they’re a bit confusing because the models are so close to each other — so perhaps its better put them between levels of the building.

I’ll test this and the rest of the experience is fine. The intro piece of the level is no longer necessary, the user doesn’t need to train anything anymore. They only need to enter the level and speak with the revealed characters, so I can have this in a singular level.

The simplicity would be very nice, think I’ll do that.

Blog 10

I went through a lot of iterations over the past week, I tried the 7 different characters between two levels and the feel wasn’t exactly right. So I decided to put them all on one level and space them out.

When going through it, I didn’t understand what the purpose of looking at the project was — I thought that I could have a description of me as an artist at the beginning, then this could turn into an interactive portfolio.

When I was showing my friend, she thought that this would be an interesting chance to showcase a personal side of me. Creating a way for people to know me with an interesting project — while I didn’t understand the purpose of this, I thought about it for a while. The projects are essentially an extension of me, they represent what I was going through or thinking at a particular time in my life.

So as I thought about it, this extrapolated into me wanting to create different archetypes. Each project is very different, similar to the archetypes. I have a lot of personal writings which I can use to train these bots. Ultimately think it would be more interesting than using the portfolio bots and speaking with them.

I think the content of this is very important, it should be something substantial and worth trying in VR. I think an interactive portfolio is more of a novelty trick rather than a meaningful and intentional experience.

So I used some of the writings I’ve collected over the years and they’re actually pretty interesting when I feed them into a model. I don’t know if I should do the synthetic data though — I want this to be very authentic.

I have two bots trained out of the 7 archetypes I’m going to use. The Unreal stuff is done for the most part, the only thing I need to do is get the real-time speech working. I’ve done it via computer and typing on keyboard, also using the Oculus keyboard has been fine, but it’s not intuitive at all and doesn’t allow real-time interactions.

Facing the same issue with the model breaking when I try to implement a STT system. Back to YouTube is is.

Blog 11

Okay, so the project is completely different now. No more Instagram API at all and also I’m not doing 7 archetypes they take a very long time to talk to and there’s not enough information train models on all of them.

I put a lot of work into it before I scrapped it, I did some midjourney images and made covers as well as a unique environment where they were not close and each sort of had their own distinctive areas which resonated with their archetype. Whittled down to 4 and some aren’t official Jungian archetypes, they’re more general from stories and human psychology.

Then I have the oracle which is going to tie everything together — sort of like the Matrix and bringing back the hive mind of one of my first projects in the course.

I used a demo level to place all of these things and each archetype will now have their own world. It’s important to showcase how different they are, and also give the user space to only focus on speaking with the archetype.

The user goes through a door in the demo room, then hikes up a mountain to speak with the archetype and jumps off in order to go back o the demo rim then decide where it wants to go next. Lastly it can enter the room and speak with the oracle.

All of this is done, still focused on getting the real-time speech going.

Blog 12

Ok, so once again reworked the experience. I tested this with a few people and they said the hiking up the mountain made them motion sick. One person mentioned jumping off the mountain made them a bit nervous because it was so high. I also got the theme of repetition a lot — going back to the world to climb back up the hill several times.

I ended up creating a guided experience where I explain it, then the user is guided through the world with each one appearing after a set amount of time. Then I thought about it, and felt there needs to be an anchor as to why the user is being transported.

I couldn’t figure out a reason, because I came to the conclusion this can’t have a story. It’s to short and also a bit to abstract, I think just showcasing the technology is enough. If I have the instructions at the beginning and explanation that they are archetypes, I dont feel anything else will be necessary.

Back to the anchor, the thought is I could further have AI integrated by making songs and using that as the audio stimuli of the world. This will be paired with binaural beats, as during my thesis research, I’ve been seeing these used hand in hand with each other.

I’ve made some of the songs on udio already — I might make some new ones.

Blog 13

Project officially done — I have all 4 songs and I have real-time speech!!!!

Last thing I want to do is implement different languages, had a friend who speaks Spanish test and it actually worked. I want to implement Chinese and Arabic since they’re some of the most widely spoken languages. Then perhaps Hindi or Spanish since they’re up there with the next most spoken languages.

I tested with several people and the experience seems enjoyable, I had to change the control aspect — focus only on the talking part by integrating a unique action to get there. So I decided to the use the left joystick click to activate this.

Implemented some cutscenes as explainers, and that’s it. There’s nothing else with the project I’m officially done. I ended up finding a unique text-to-speech system and implemented the inverse to get both sides working. Then the API works with multiple languages as well.


Leave a Reply

Your email address will not be published. Required fields are marked *