New SEO spider features include the following –
  • Word count – The SEO spider now counts the number of words on a given URL between the body tags. This is useful for finding low content pages, you can read our word count definition here.
  • URL rewriting – The SEO spider now allows you to rewrite URLs. This is particularly useful for sites with session IDs or excess parameters, you can now simply remove them from the URLs using this feature. You can read about URL rewriting in our user guide.
  • Auto check for updates – You don’t have to manually check for updates anymore, we let you know when one is available. (You can also disable this feature!)
  • Remove URLs – We allow you to delete URLs completely from the SEO spider (upon the right click). So if you only wish to export certain URLs, or create a sitemap with specific URLs, you can do it in the interface (rather than exporting to Excel).
  • Advanced exports – We have renamed the ‘export’ option in the top level menu, to ‘Advanced export’ to differentiate it from the usual ‘export’ option. This area allows you to export in bulk, rather than just from the window in your current view. We have included additional exports under this section as well, including exporting of all alt text and anchor text. You can read more about the advanced export feature in our user guide.
  • Crawling outside of sub folders / domains – As default the SEO spider has always crawled from sub domain or sub directory forwards. This is really useful for most sites, but there are some configurations where this can be a pain. So, we have provided a couple of extra options to crawl outside of start sub folders or sub domains for greater control of crawl. So you can now crawl from anywhere you’d like on the site using this feature and we will crawl all URLs for example. Both of these new options can be found under the ‘include‘ option in configuration.
The list above contains the most significant new features within version 2.00. However, we have also made a number of smaller changes, these include –
  • Amended the definition of ‘internal’ and ‘external’ – Historically links from the domain you are crawling can be included under the ‘external’ tab as well as the ‘internal’ tab. If you crawled from a sub folder for example, anything outside of that sub folder would be treated as ‘external’, including links from the domain you are crawling. This was by design for a number of reasons, but we understand that it has at times been a cause of some confusion. Hence, we have changed our crawling. The ‘external’ tab, will now only show links pointing to other domains (or subdomains).
  • Renamed ‘Meta & Canonical’ – We amended this tab name to ‘directives’, as it makes more sense. This gives us more room to include additional directives under this tab such as rel=alternate and rel=prev/next etc.
  • Fixed Keep Alive Headers Issue – There was a bug in the Mac version of the software with keep alive headers, this has been fixed.
  • Ubuntu version now supports openjdk-6-jre – There were some bugs in openjdk-6-jre which mean we couldn’t support it, we now can.
  • Enough reading, what are you waiting for? Go download the new version of the Screaming Frog SEO spider now!