Today I noticed my site getting a thorough spidering by the user agent “
Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com)” and appearing to be sourced from what appears to be an Amazon Web Services IP address,
The Page-store.com web site is minimal, with just a single page and a robots.txt that forbids all crawlers. It does describe what they are trying to do though, which is to spider everything and then sell some digested form of that gathered information onto new search engines so they don’t have to do the work themselves. In their words:
Page-store positions itself as a web wholesaler, supplying page and link information to vertical search engine companies on a per-use basis. The effect is to level the playing field between vertical search and general horizontal internet search.
If nothing else it scores highly on the buzzword bingo scale.