[ home / overboard ] [ spam ] [ soy / qa / raid / r ] [ int / pol ] [ a / an / asp / biz / mtv / r9k / tech / v / sude / x ] [ q / news / chive / rules / pass / bans / status ] [ wiki / booru / irc ]

A banner for soyjak.party

/soy/ - Soyjaks

m.umma j.an.ny please do'nt let hindupho.b.ia win. Bloody b.enchod son of bi.t.ch janny i fuck you bloody! You are the mother fu.k.er!
Catalog
Email
Subject
Comment
File
Password (For file deletion.)

File: Screenshot_20260612-223432_WhatsApp.jpg 📥︎ (176.23 KB, 720x1237) ImgOps

File: dipxwqw-a5a3554b-62cc-45a8-ab4a-c1ded2713d5d.png 📥︎ (82.32 KB, 600x800) ImgOps

 â„–16439526[Quote]

I tried to make ai use 'speak but it became schizo

 â„–16439537[Quote]

File: YYYYYYYYYEEEEEESSSSSSSS.mp4 📥︎ (9.23 MB, 720x720) ImgOps

File: YYYYYYYEEEEEESSSSSSSSSS.mp4 📥︎ (5.66 MB, 1280x720) ImgOps

^I tried to make ai use 'speak but it became schizo

 â„–16439549[Quote]

ask xhim about NBA soyjaks

 â„–16439552[Quote]

itt we adopt ai soyspeak

 â„–16439554[Quote]

>>16439526 (OP)
>NBA soyjaks in /soy/
Marge

 â„–16439558[Quote]

NBA soyjaks will always be sharty culture

 â„–16439569[Quote]

Somebaldi should train an AI from soyjak wiki to create /soy/GPT

 â„–16439600[Quote]

op ask it to explain nobeidore and/or nobaldi

 â„–16439636[Quote]

File: 1780064119814y.png 📥︎ (32.09 KB, 886x872) ImgOps

>>16439569
>If your goal is to make a chatbot that talks like posters from Soyjak communities, the general process is:

1. Collect data

* Crawl pages from soyjakwiki.org.
* Archive thread data from soyjak.party (respecting the site's rules and terms).
* Extract only the text you want the model to learn from.
* Remove duplicates, broken posts, spam floods, and irrelevant content.

2. Clean the dataset

* Convert posts into a conversational format.
* Remove personally identifying information.
* Decide whether to keep things like greentexts, slang, emojis, image captions, and reaction phrases.
* Filter out content you don't want the model reproducing.

3. Choose a base model

* Small models: Llama 3 derivatives, Qwen, Mistral.
* For a hobby project, a 7B–14B parameter model is often enough.

4. Fine-tune

* Use LoRA/QLoRA rather than training from scratch.
* Convert your dataset into instruction/chat format:

```json
{
"messages": [
{"role":"user","content":"What do you think of X?"},
{"role":"assistant","content":"Typical /soy/ style response"}
]
}
```
* Train on a GPU using frameworks such as [Axolotl](https://github.com/axolotl-ai-cloud/axolotl?utm_source=chatgpt.com), [Unsloth](https://github.com/unslothai/unsloth?utm_source=chatgpt.com), or [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory?utm_source=chatgpt.com).

5. Add retrieval (optional)

* Instead of forcing the model to memorize everything, store Soyjak Wiki articles in a vector database.
* Let the model search them when answering questions.
* This usually works better than pure fine-tuning for factual wiki content.

### Hardware

For a small hobby project:

* 7B model + QLoRA: 16–24 GB VRAM.
* 14B model + QLoRA: 24–48 GB VRAM.
* Renting GPUs is often cheaper than buying one.

### Important limitation

A model trained heavily on Soyjak Party threads will tend to reproduce the language patterns found there, including offensive, hateful, harassing, or otherwise toxic content. If you intend to distribute the model publicly, you'll want additional filtering and moderation during dataset preparation and inference.

### Alternative approach

If your goal is specifically "make ChatGPT sound like /soy/" rather than building a model from scratch, a cheaper approach is:

* Download a large corpus of /soy/ posts.
* Put them into a vector database.
* Use a local model such as Llama 3 or Qwen 3.
* Give the model a system prompt describing the culture and slang.
* Retrieve relevant posts as examples before each response.

That can get surprisingly close to a "/soy/GPT" without needing a full fine-tuning run.

 â„–16439644[Quote]

>>16439636
someone do dis

 â„–16439843[Quote]

>>16439636
Imma see if I can do dis

 â„–16439896[Quote]

geeg



[Return][Catalog][Go to top][Post a Reply]
Delete Post [ ]
[ home / overboard ] [ spam ] [ soy / qa / raid / r ] [ int / pol ] [ a / an / asp / biz / mtv / r9k / tech / v / sude / x ] [ q / news / chive / rules / pass / bans / status ] [ wiki / booru / irc ]