Crawling Subdirectories
Option 1: Only Subdirectory
You can crawl a sub-directory of a website using the Only Subdirectory feature in the advanced options under Other Options / Filter Links. Use a working starting URL that points to the subdirectory landing page and don't forget to turn Start URL as root to ON. Enter the subdirectory only, ex. /subdirectory/. * You must have an active subscription to use this feature.
1. URL: https://www.example.com/subdirectory
2. Open Basic Options and turn on Start URL as root: ON
3. Open Advanced Options / Filter Links Before Crawl
4. Add the subdirectory in the Only Subdirectory field: /subdirectory/
5. Then click Start Crawling
Option 2: Include
You can filter a website crawl using the Include feature in the Advanced Options / Filter Links After Crawl. Use a working starting URL that points to the subdirectory landing page and don't forget to turn Start URL as root to ON. Enter the subdirectory only, ex. /subdirectory/. This method will crawl the entire website before filtering your directories.
1. URL: https://www.example.com/subdirectory
2. Open Basic Options and turn on Start URL as root: ON
3. Open Advanced Options / Filter Links After Crawl
4. Add the subdirectory that you want to include Ex. /subdirectory/
5. Select your match type and click enter
6. Start Crawling
Restricting Subdirectories
Option 1: Skip Links
Use the Skip links feature to skip pages during your website crawl that match a certain format. This feature is great for removing unnecessary pages from your sitemap build and content inventory. Skip links is a pre-crawl function.
- Open Advanced Options / Filter Links Before Crawl
- On the right side, in the Skip Links column -
- Select a matching rule
- Enter a part of the URL
- Press the Enter Icon
- Repeat if you have other URLs that you would like to skip
- Click START CRAWLING
Option 2: Exclude
The Exclude function is a post-crawl function and will restrict the crawler from including certain directories. Use the Exclude links feature to only exclude pages that match a certain format after the crawl has completed. This is a very accurate method of exclusion.
- Open Advanced Options / Filter Links After Crawl
- On the left side, in the Exclude Links column -
- Select a matching rule
- Enter a part of the URL
- Press the Enter Icon
- Repeat if you have other URLs that you would like to exclude
- Click START CRAWLING
You can copy and paste line statements if you click the Switch to Text Editor.
Comments
0 comments
Article is closed for comments.