[ home / overboard ] [ soy / qa / raid / r ] [ int / pol ] [ a / an / asp / biz / mtv / r9k / tech / v / sude / x ] [ q / news / chive / rules / pass / bans / status ] [ wiki / booru / irc ]

A banner for soyjak.party

/tech/ - Soyence and Technology

Download more RAM for your Mac here
Catalog
Email
Subject
Comment
File
Password (For file deletion.)

File: 1775685033136r.png 📥︎ (173.6 KB, 600x800) ImgOps

File: 1772757287913n.pdf 📥︎ (811.87 KB)

 â„–30289[Quote]

I want to know what /tech/ thinks about this subject.
I tried to talk about this on the sharty a few weeks ago but I forgot this board existed then.
However, since Quote mentioned one of the papers that inspired me to post about this originally in his request for that archive website to stop scraping us, I think now's a good time to bring the topic back.

Link to the paper on arXiv if you don't trust my PDF
https://arxiv.org/abs/2602.16800

My thoughts are that while scary, reading the paper reaveals we still have a couple of years untill things get really bad. Although small communities are more at risk right now than larger ones since effectiveness drops off as the candidate pool grows.

The 67% recall rate mentioned in their abstract also comes from accounts the authors admit
>are likely easier to deanonymize than an average profile
With a much lower success rate when matching posts made by the same RedditBVLL on different subreddits (8.5% on average, although it reached 48.1% for very active users)

So while it's not as bad as the abstract makes it out to be, these numbers are guaranteed to rise as LLMs get better.
Not really sure what an individual could to to mitigate this besides being extremely careful with what you write and how you write it, and including red herrings in your posts.

On the other hand I'm optimistic we could use this to our advantage when doxing in the future.
Thoughever, this would require someone to set up a pipeline like the one mentioned in the paper to mitigate costs and even then, because $1-$4 per query could balloon costs very quickly it would have to be limited to trusted users and only used for doxing targets we otherwise are unable to make progress against.
I know theres that one site that already uses ai on data breaches but it runs a local model that's nowhere near the level of what was used in this paper.

 â„–30291[Quote]

File: 1772375764363f.gif 📥︎ (2.58 MB, 374x234) ImgOps

>>30289 (OP)
this shit is scary

 â„–30294[Quote]

snca

 â„–30295[Quote]

File: 1775054945051m.png 📥︎ (9.12 KB, 594x624) ImgOps

File: 1771653177058n.png 📥︎ (5.09 KB, 594x624) ImgOps

File: 1775224542904t.png 📥︎ (9.25 KB, 594x624) ImgOps

File: 1772841797010k.png 📥︎ (7.74 KB, 594x624) ImgOps

>>30294
Shit I care about though

 â„–30296[Quote]

File: d699ea0b596a63463274a0d252….jpg 📥︎ (180.5 KB, 860x1148) ImgOps


 â„–30298[Quote]

as always
don't use the username twice
behave differently on different identities (don't spam your autistic obsessions on your every account)

 â„–30299[Quote]

>>30298
fucking jarty DO THIS!!!

 â„–30302[Quote]

bump

 â„–30349[Quote]

there is a virus in the pdf

 â„–30460[Quote]

File: 1772412934644x.png 📥︎ (236.57 KB, 1153x1215) ImgOps

>there is a virus in the pdf

 â„–31146[Quote]

Negrobumping my thread one (1) time only because I'm an obsessed shitskin nigger from Lesotho and want more engagement with this topic.

 â„–31151[Quote]

>>31146
You shouldn't be concerned as long as you don't namefag across threads and don't post real data (hobbies included) on any accounts. 'teens usually don't datamine themselves as much as normigroids, excluding namefags. Some red herrings will break these LLMs apart doe
It'll be to our advantage. Unless these LLMs will get access to 'arty's system identifiers of posters. Highly unlikely, for it a joon must scrape and 'chive all threads for a long time, nobaldi will do dis

 â„–31162[Quote]

>>31151
I already take those kinds of precautions but my fear is that LLMs will eventually get sophisticated enough that they'll be able to connect posts and accounts through analysis of the various idiosyncrasies in writing style and how an individual construct their posts.
<'dit
Although, reading what I wrote above back now, I do think I have a tendancy towards paranoia and catastrophizing stuff like this so maybe you're right and we shouldn't worry too much.

 â„–31172[Quote]

>>31162
we don't write huge walls of text here geg
but indeed some nusois overuse particular words and stickers, don't do this and stay neutral o algo

 â„–31176[Quote]

File: 1777556977536s.gif 📥︎ (24.77 KB, 600x800) ImgOps

>>31162
I don't think that's too unrealistic.
I was a NEET my entire life up until I was 16 (don't ask) and from age 10-11, I spent all day every day on 4chan. This affected my social development in a very odd way.
I'm awful at picking up social cues in person as you'd expect, but I'm extremely good at identifying and distinguishing between people from their writings alone.
It happens completely automatically, the best way I can describe it is different posters have different "voices" when I read their posts in my head. It doesn't turn off outside of imageboards either, I catch people trying to abandon old identities by switching to new accounts on forums all the time with it.

My theory is that since all my social interactions during my formative years were on anonymous imageboards, my brain trained itself to pick up on writing intricacies rather than traditional social cues.
And if a human can be trained to do this, then I'm sure machine learning can be used to do the same thing much faster and at a much larger scale.

 â„–31182[Quote]

>>31176
What is the % of posts where you can clearly identify the posters? And how sure are you about it?
For example, do it for this thread

 â„–31191[Quote]

>>31182
>What is the % of posts where you can clearly identify the posters?
Not sure the exact percentage but it heavily depends on how long the post is. There's not much info to go off with single word replies or short sentences (though briefness itself can be an identifier at times) because there are only so many ways to express the same sentiment. It gets way easier at around the 50+ word mark since at point there's room for deviation in sentence structuring.
>And how sure are you about it?
I'm generally not since I can't confirm it, but when I call people out for samefagging or using alts they don't usually deny it.
>For example, do it for this thread
You feel like >>31151 and >>31172
>>30298 seems distinct and I don't think he's made any other posts ITT
>>31146 and >>31162 have a distinct style too but he outright states that he's OP and anyone can infer the rest from the reply order so it doesn't count.
Nobody else ITT really said enough for me to get a read on them

 â„–31192[Quote]

>>31191
>>>30298 seems distinct and I don't think he's made any other posts ITT
Actually reading it again, I feel like he might be the same poster as >>31182 >>31172 >>31151

 â„–31209[Quote]

>>31191
Your first impression was right, >>31151 >>31172 >>31182 are my reppies, >>30298 isn't. Ev&doe >>31172 is with slightly different writing style
Seems relatively easy because there are like 3-4 'teens here. Anyway I'm impressed, try for this thread: soyjak.st/r9k/thread/87169.html
We definitely need to research this more

 â„–31210[Quote]

didn't want to sage btw

 â„–31214[Quote]

>>30289 (OP)
This method really only works for long-format posts, otherwise a sentence of 7 words or less (most posts here) are worthless and cannot identify you

 â„–31257[Quote]

>>30289 (OP)
I remember some guy did dis, but he scraped 2 billion jewtube comments and it had an unusually high success rate

 â„–31264[Quote]

>>31257
because jewtube has profiles or however the data is mined

 â„–31279[Quote]

VP, I want >>31191 this neuroGOD to test his abilities

 â„–31280[Quote]

I don't think AI will ever have enough intuition to do shit like this >>31191

 â„–31281[Quote]

>>31191
i have the same ability
i tranny heart stylometry

 â„–31283[Quote]

>>31281
then try for this thread o algo: soyjak.st/r9k/thread/87169.html

 â„–31293[Quote]

This shit doesn't matter no matter what way you look at it. Glowniggers have everyone raped at hardware level, if they wanted to de-anonymize you, they would in a flash.



[Return][Catalog][Go to top][Post a Reply]
Delete Post [ ]
[ home / overboard ] [ soy / qa / raid / r ] [ int / pol ] [ a / an / asp / biz / mtv / r9k / tech / v / sude / x ] [ q / news / chive / rules / pass / bans / status ] [ wiki / booru / irc ]