In this section, we will build the core functionality of the data scraper to collect posts.
Most cyber-crime forums offer free access to all users, with paid access unlocking features like posting in specific categories or performing additional actions. When using free accounts for web scraping, forums often limit the content you can view or comment on within a 24-hour period. To work around this, you should limit requests and distribute tasks across multiple accounts.
The tornet_forum includes a marketplace category with paginated content. Each page displays a table with 10 rows, representing 10 posts. If there are hundreds of pages, this results in a large volume of content to scrape thoroughly without missing any data.
Here’s an example of the marketplace pagination:

My approach is to create a script that takes a pagination URL and the maximum pagination number as inputs. It then generates a list of pagination URLs, dividing them into batches of 10. For example, if there are 12 pagination pages, the script creates two batches: one with 10 pagination URLs and another with 2. Since each page contains 10 posts, each bot will scrape 100 post links per batch of 10 pages.
Here’s an example of a batch structure:
http://127.0.0.1:5000/category/marketplace/Sellers?page=1
http://127.0.0.1:5000/category/marketplace/Sellers?page=2
http://127.0.0.1:5000/category/marketplace/Sellers?page=3
...
http://127.0.0.1:5000/category/marketplace/Sellers?page=10