It is called finetuning. I haven’t tried it but oobagooba’s text-generation-webui has a tab to do it and I believe it is pretty straightforward.
Fine tune a base model on your dataset and then tou will then need to format your prompt in the way your AIM logs are organized. e.g. you will need to add “<ch00f>” add the end of your text completion task. It will complete it in the way it learnt it.
If you don’t have a the GPU for it, many companies offer fine-tuning as a service like Mistral
Why would you want this??? Anything I wrote from 16 years ago is so beyond cringey. You must have been a stellar kid.
Because funy
I have 26 years of saved outgoing email.
Recently I needed to redo a fix I learned about in 1998 and implemented then. I implemented it again to install a crappy software project that from its composition canNOT have been from before the post-y2k firing of so many mentors.
Only remembered after 3 hours of searching, saving myself another few hours and surely a nervous breakdown. But, after filtering AD on the client end, the project installed easily.
That’s the best example, but the things I don’t discover I answered already on Stackoverflow I discover I answered years ago in email.
Putting aside why you’d want to do this, it’d be pretty easy, actually. You’d still use a big model like GPT4 or Claude as your “base” but you would do two things:
- Give it a knowledge base using your conversatons. You can manually vectorize them into a key-value database like Pinecone and build yourself an agent using a toolchain like Langchain, or just use a service (OpenAI Agents lets you upload data from your browser)
- Have one of the big LLMs (with a large context size) ingest all of those conversations and build out a prompt that describes “you”
you would then
- Feed that generated prompt (with your own edits, of course) back into either your custom Langchain agent or OpenAI Agent
Not hard with Huggingface PEFT
You may try https://github.com/instructlab. You will need to transform those conversations to a specific yaml format.
The real question is why do you have 64 mb of aim conversations?
Because I communicated with a lot of people over AIM? It’s actually more than just high school. Covers 2004 to around 2012. Also it’s 64mb zipped. Actual size is much larger.