[ home / overboard ] [ soy / qa / raid / r ] [ soy2 / tdh ] [ ss / craft ] [ int / pol ] [ a / an / asp / biz / mtv / r9k / tech / v / x ] [ q / news / chive / rules / pass / bans / status ] [ wiki / booru / irc ]

A banner for soyjak.party

/tech/ - Soyence and Technology

Download more RAM for your Mac here
Catalog
Email
Subject
Comment
File
Password (For file deletion.)

File: Screenshot 2025-11-19 at 2….png 📥︎ (1.3 MB, 4078x2336) ImgOps

File: Screenshot 2025-11-19 at 2….png 📥︎ (584.95 KB, 4060x1406) ImgOps

 21351[Quote]

I'm currently working on my fourth attempt at making a decent dark web search engine.
<
My first attempt was more of a test to see if I could actually use Tor in my code. It didn't have a front end and searching was limited to the sites description with no real searching algorithm apart from "IS LIKE %query%". It was all in the terminal without actually serving any site to the tor network.
<
My second attempt went better. This time I wanted to focus on the front end (the part I hate the most). I figured out how to host hidden services with Flask and, after spending way too long generating a cool domain, I finally had a working site up and running. It had the same searching (o) algo, if you can even call it that, as the last. It also took forever (around 2 minutes without cache) to load because Nginx was being a selfish little fuck and not letting me use it with tor (this could have been because I had 2 hidden services running at the same time, as I was working on another project (Onion365) at the time). Picrel is what this attempt looked like. I got carried away and added way too much bloat to the homepage. This one had 129644 sites indexed, most were homepages though.
<
My third attempt was just back-end stuff again. This included improving the filters, reworking the scraper to work more efficiently, and entirely remaking the search algorithm to use tokenizers for way better search results. This had no front end and I didnt do much testing with the tokenizers before moving on so I'm not sure how much that actually helped.
<
My current attempt is focused on remaking the crawler to be more efficient and work asynchronously (the latter I just implemented). It's already working way faster than any of my previous ones (with around 0.5 seconds per scrape instead of just 3-5). Another one of my goals for this is to finally get Nginx to run correctly. I'm also selectively caching websites (only the pure HTML, no media) this time as current archival hidden services are unreliable. I have planned to implement AI-assisted filters to avoid false detections from just using keywords. I will rework part of the front end (more search parameters and such), but not much will change. I have over 200k domains queued up to be crawled.
<
I did not "vibe code" any of this. I strongly dislike that term and those who do/promote it. I like to keep LLM usage at a minimum for my projects, but I did need the asyncio library explained before I could implement it and fix the bugs myself. As I said I suck at front end so I also needed some help with getting CSS to look like how I wanted it to.
<
I am unsure on if it is against the rules to link to my hidden service, even though I aggressively filter out any and all pornography and erotic content from search results, so I'll hold off on that for now. If this post includes anything against the rules then it was not on purpose. I know that the dark web is a touchy subject, but I just wanted to share my hobby project with you guys.
<
Leave any suggestions or questions you have ITT.


[Return][Catalog][Go to top][Post a Reply]
Delete Post [ ]
[ home / overboard ] [ soy / qa / raid / r ] [ soy2 / tdh ] [ ss / craft ] [ int / pol ] [ a / an / asp / biz / mtv / r9k / tech / v / x ] [ q / news / chive / rules / pass / bans / status ] [ wiki / booru / irc ]