I want to know what /tech/ thinks about this subject.I tried to talk about this on the sharty a few weeks ago but I forgot this board existed then.However, since Quote mentioned one of the papers that inspired me to post about this originally in his re

Email
Subject
Comment
Verification	Please type the name of the object/variant:
File
Embed
Voice
Poll
Password	(For file deletion.)

File: 1775685033136r.png 📥︎ (173.6 KB, 600x800) ImgOps

Large-sacle online deanonymization Chud 04/18/26 (Sat) 16:06:58 №30289 [Quote]

I want to know what /tech/ thinks about this subject.
I tried to talk about this on the sharty a few weeks ago but I forgot this board existed then.
However, since Quote mentioned one of the papers that inspired me to post about this originally in his request for that archive website to stop scraping us, I think now's a good time to bring the topic back.

Link to the paper on arXiv if you don't trust my PDF
https://arxiv.org/abs/2602.16800

My thoughts are that while scary, reading the paper reaveals we still have a couple of years untill things get really bad. Although small communities are more at risk right now than larger ones since effectiveness drops off as the candidate pool grows.

The 67% recall rate mentioned in their abstract also comes from accounts the authors admit
>are likely easier to deanonymize than an average profile
With a much lower success rate when matching posts made by the same RedditBVLL on different subreddits (8.5% on average, although it reached 48.1% for very active users)

So while it's not as bad as the abstract makes it out to be, these numbers are guaranteed to rise as LLMs get better.
Not really sure what an individual could to to mitigate this besides being extremely careful with what you write and how you write it, and including red herrings in your posts.

On the other hand I'm optimistic we could use this to our advantage when doxing in the future.
Thoughever, this would require someone to set up a pipeline like the one mentioned in the paper to mitigate costs and even then, because $1-$4 per query could balloon costs very quickly it would have to be limited to trusted users and only used for doxing targets we otherwise are unable to make progress against.
I know theres that one site that already uses ai on data breaches but it runs a local model that's nowhere near the level of what was used in this paper.

Chud 04/18/26 (Sat) 16:16:32 №30291 [Quote]

File: 1772375764363f.gif 📥︎ (2.58 MB, 374x234) ImgOps

>>30289 (OP)
this shit is scary

Chud 04/18/26 (Sat) 16:34:35 №30294 [Quote]

snca

Chud 04/18/26 (Sat) 16:41:39 №30295 [Quote]

File: 1775054945051m.png 📥︎ (9.12 KB, 594x624) ImgOps

File: 1771653177058n.png 📥︎ (5.09 KB, 594x624) ImgOps

File: 1775224542904t.png 📥︎ (9.25 KB, 594x624) ImgOps

File: 1772841797010k.png 📥︎ (7.74 KB, 594x624) ImgOps

>>30294
Shit I care about though

Chud 04/18/26 (Sat) 16:42:14 №30296 [Quote]

File: d699ea0b596a63463274a0d252….jpg 📥︎ (180.5 KB, 860x1148) ImgOps

Chud 04/18/26 (Sat) 16:46:52 №30298 [Quote]

as always
don't use the username twice
behave differently on different identities (don't spam your autistic obsessions on your every account)

Chud 04/18/26 (Sat) 19:02:43 №30299 [Quote]

>>30298
fucking jarty DO THIS!!!

Chud 04/18/26 (Sat) 20:10:09 №30302 [Quote]

bump

Chud 04/19/26 (Sun) 23:07:47 №30349 [Quote]

there is a virus in the pdf

Chud 04/20/26 (Mon) 17:46:55 №30460 [Quote]

File: 1772412934644x.png 📥︎ (236.57 KB, 1153x1215) ImgOps

>there is a virus in the pdf

Chud 05/05/26 (Tue) 17:18:03 №31146 [Quote]

Negrobumping my thread one (1) time only because I'm an obsessed shitskin nigger from Lesotho and want more engagement with this topic.

Chud 05/05/26 (Tue) 19:09:52 №31151 [Quote]

>>31146
You shouldn't be concerned as long as you don't namefag across threads and don't post real data (hobbies included) on any accounts. 'teens usually don't datamine themselves as much as normigroids, excluding namefags. Some red herrings will break these LLMs apart doe
It'll be to our advantage. Unless these LLMs will get access to 'arty's system identifiers of posters. Highly unlikely, for it a joon must scrape and 'chive all threads for a long time, nobaldi will do dis

Chud 05/05/26 (Tue) 19:52:04 №31162 [Quote]

>>31151
I already take those kinds of precautions but my fear is that LLMs will eventually get sophisticated enough that they'll be able to connect posts and accounts through analysis of the various idiosyncrasies in writing style and how an individual construct their posts.
<'dit
Although, reading what I wrote above back now, I do think I have a tendancy towards paranoia and catastrophizing stuff like this so maybe you're right and we shouldn't worry too much.

Chud 05/06/26 (Wed) 06:17:29 №31172 [Quote]

>>31162
we don't write huge walls of text here geg
but indeed some nusois overuse particular words and stickers, don't do this and stay neutral o algo

Chud 05/06/26 (Wed) 10:08:47 №31176 [Quote]

File: 1777556977536s.gif 📥︎ (24.77 KB, 600x800) ImgOps

>>31162
I don't think that's too unrealistic.
I was a NEET my entire life up until I was 16 (don't ask) and from age 10-11, I spent all day every day on 4chan. This affected my social development in a very odd way.
I'm awful at picking up social cues in person as you'd expect, but I'm extremely good at identifying and distinguishing between people from their writings alone.
It happens completely automatically, the best way I can describe it is different posters have different "voices" when I read their posts in my head. It doesn't turn off outside of imageboards either, I catch people trying to abandon old identities by switching to new accounts on forums all the time with it.

My theory is that since all my social interactions during my formative years were on anonymous imageboards, my brain trained itself to pick up on writing intricacies rather than traditional social cues.
And if a human can be trained to do this, then I'm sure machine learning can be used to do the same thing much faster and at a much larger scale.

Chud 05/06/26 (Wed) 13:44:49 №31182 [Quote]

>>31176
What is the % of posts where you can clearly identify the posters? And how sure are you about it?
For example, do it for this thread

Chud 05/06/26 (Wed) 18:51:32 №31191 [Quote]

>>31182
>What is the % of posts where you can clearly identify the posters?
Not sure the exact percentage but it heavily depends on how long the post is. There's not much info to go off with single word replies or short sentences (though briefness itself can be an identifier at times) because there are only so many ways to express the same sentiment. It gets way easier at around the 50+ word mark since at point there's room for deviation in sentence structuring.
>And how sure are you about it?
I'm generally not since I can't confirm it, but when I call people out for samefagging or using alts they don't usually deny it.
>For example, do it for this thread
You feel like >>31151 and >>31172
>>30298 seems distinct and I don't think he's made any other posts ITT
>>31146 and >>31162 have a distinct style too but he outright states that he's OP and anyone can infer the rest from the reply order so it doesn't count.
Nobody else ITT really said enough for me to get a read on them

Chud 05/06/26 (Wed) 19:04:01 №31192 [Quote]

>>31191
>>>30298 seems distinct and I don't think he's made any other posts ITT
Actually reading it again, I feel like he might be the same poster as >>31182 >>31172 >>31151

Chud 05/07/26 (Thu) 04:52:27 №31209 [Quote]

>>31191
Your first impression was right, >>31151 >>31172 >>31182 are my reppies, >>30298 isn't. Ev&doe >>31172 is with slightly different writing style
Seems relatively easy because there are like 3-4 'teens here. Anyway I'm impressed, try for this thread: soyjak.st/r9k/thread/87169.html
We definitely need to research this more

Chud 05/07/26 (Thu) 04:53:07 №31210 [Quote]

didn't want to sage btw

Chud 05/07/26 (Thu) 08:32:05 №31214 [Quote]

>>30289 (OP)
This method really only works for long-format posts, otherwise a sentence of 7 words or less (most posts here) are worthless and cannot identify you

Chud 05/08/26 (Fri) 06:50:19 №31257 [Quote]

>>30289 (OP)
I remember some guy did dis, but he scraped 2 billion jewtube comments and it had an unusually high success rate

Chud 05/08/26 (Fri) 09:06:26 №31264 [Quote]

>>31257
because jewtube has profiles or however the data is mined

Chud 05/08/26 (Fri) 16:10:15 №31279 [Quote]

VP, I want >>31191 this neuroGOD to test his abilities

Chud 05/08/26 (Fri) 16:13:29 №31280 [Quote]

I don't think AI will ever have enough intuition to do shit like this >>31191

Chud 05/08/26 (Fri) 16:15:51 №31281 [Quote]

>>31191
i have the same ability
i tranny heart stylometry

Chud 05/08/26 (Fri) 16:36:19 №31283 [Quote]

>>31281
then try for this thread o algo: soyjak.st/r9k/thread/87169.html

Chud 05/08/26 (Fri) 18:46:41 №31293 [Quote]

This shit doesn't matter no matter what way you look at it. Glowniggers have everyone raped at hardware level, if they wanted to de-anonymize you, they would in a flash.