I tried to make ai use 'speak but it became schizo

Email
Subject
Comment
Verification	Please type the name of the object/variant:
File
Embed
Voice
Poll
Password	(For file deletion.)

Wewjak 06/12/26 (Fri) 21:47:51 №16439526 [Quote]

I tried to make ai use 'speak but it became schizo

Chud 06/12/26 (Fri) 21:49:20 №16439537 [Quote]

File: YYYYYYYYYEEEEEESSSSSSSS.mp4 📥︎ (9.23 MB, 720x720) ImgOps

File: YYYYYYYEEEEEESSSSSSSSSS.mp4 📥︎ (5.66 MB, 1280x720) ImgOps

^I tried to make ai use 'speak but it became schizo

Chud 06/12/26 (Fri) 21:50:17 №16439549 [Quote]

ask xhim about NBA soyjaks

Chud 06/12/26 (Fri) 21:50:32 №16439552 [Quote]

itt we adopt ai soyspeak

Chud 06/12/26 (Fri) 21:50:46 №16439554 [Quote]

>>16439526 (OP)
>NBA soyjaks in /soy/
Marge

Chud 06/12/26 (Fri) 21:50:57 №16439558 [Quote]

NBA soyjaks will always be sharty culture

Chud 06/12/26 (Fri) 21:51:42 №16439569 [Quote]

Somebaldi should train an AI from soyjak wiki to create /soy/GPT

Chud 06/12/26 (Fri) 21:54:56 №16439600 [Quote]

op ask it to explain nobeidore and/or nobaldi

Chud 06/12/26 (Fri) 21:58:17 №16439636 [Quote]

File: 1780064119814y.png 📥︎ (32.09 KB, 886x872) ImgOps

>>16439569
>If your goal is to make a chatbot that talks like posters from Soyjak communities, the general process is:

1. Collect data

* Crawl pages from soyjakwiki.org.
* Archive thread data from soyjak.party (respecting the site's rules and terms).
* Extract only the text you want the model to learn from.
* Remove duplicates, broken posts, spam floods, and irrelevant content.

2. Clean the dataset

* Convert posts into a conversational format.
* Remove personally identifying information.
* Decide whether to keep things like greentexts, slang, emojis, image captions, and reaction phrases.
* Filter out content you don't want the model reproducing.

3. Choose a base model

* Small models: Llama 3 derivatives, Qwen, Mistral.
* For a hobby project, a 7B–14B parameter model is often enough.

4. Fine-tune

* Use LoRA/QLoRA rather than training from scratch.
* Convert your dataset into instruction/chat format:

```json
{
"messages": [
{"role":"user","content":"What do you think of X?"},
{"role":"assistant","content":"Typical /soy/ style response"}
]
}
```
* Train on a GPU using frameworks such as [Axolotl](https://github.com/axolotl-ai-cloud/axolotl?utm_source=chatgpt.com), [Unsloth](https://github.com/unslothai/unsloth?utm_source=chatgpt.com), or [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory?utm_source=chatgpt.com).

5. Add retrieval (optional)

* Instead of forcing the model to memorize everything, store Soyjak Wiki articles in a vector database.
* Let the model search them when answering questions.
* This usually works better than pure fine-tuning for factual wiki content.

### Hardware

For a small hobby project:

* 7B model + QLoRA: 16–24 GB VRAM.
* 14B model + QLoRA: 24–48 GB VRAM.
* Renting GPUs is often cheaper than buying one.

### Important limitation

A model trained heavily on Soyjak Party threads will tend to reproduce the language patterns found there, including offensive, hateful, harassing, or otherwise toxic content. If you intend to distribute the model publicly, you'll want additional filtering and moderation during dataset preparation and inference.

### Alternative approach

If your goal is specifically "make ChatGPT sound like /soy/" rather than building a model from scratch, a cheaper approach is:

* Download a large corpus of /soy/ posts.
* Put them into a vector database.
* Use a local model such as Llama 3 or Qwen 3.
* Give the model a system prompt describing the culture and slang.
* Retrieve relevant posts as examples before each response.

That can get surprisingly close to a "/soy/GPT" without needing a full fine-tuning run.

Chud 06/12/26 (Fri) 21:59:09 №16439644 [Quote]

>>16439636
someone do dis

Chud 06/12/26 (Fri) 22:21:28 №16439843 [Quote]

>>16439636
Imma see if I can do dis

Chud 06/12/26 (Fri) 22:28:49 №16439896 [Quote]

geeg