Don't forget to follow best practices when crawling private pages of a website or software application.
BEST PRACTICES FOR CRAWLING PASSWORD PROTECTED WEBSITES
Use a read-only account when crawling. When you’re crawling through your protected site, you’re going to be picking up on things that need to be changed. You may even be tempted to fix them yourself as you go along, but you have to remember the reasons for your crawling the website in the first place. One of them is to make a list for the developers to sort out. After all, they’re the professionals, and you’ve hired them to deal with these kinds of issues. If you lack the expertise, you may end up doing more harm than good. By using a full administrative username and password when crawling your website, you’re also giving the crawler full access to the entire admin section. While the best crawlers won’t do anything of the like, with full administrative access, you risk having your crawler making changes to the site’s themes and plugins, even deleting posts. This is why it is imperative that you setup a read-only account for crawling your website.
Always exclude your admin pages. This carries on from what we’ve said in the above point: the read-only account you’ve set up for crawling your website must exclude the administrative back-end pages and folders to help you avoid making those undesirable changes to your website. For WordPress, you’ll want to exclude the entire /wp-admin/ section; for Joomla, the same applies for the /administrator/ section.