<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule"	>
<channel>
	<title>Comments on: Page-store.com</title>
	<atom:link href="http://www.csamuel.org/2007/07/22/page-storecom/feed" rel="self" type="application/rss+xml" />
	<link>http://www.csamuel.org/2007/07/22/page-storecom</link>
	<description>The Thoughts and Feelings of a Melbourne Person</description>
	<lastBuildDate>Fri, 05 Mar 2010 20:34:51 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Toppy</title>
		<link>http://www.csamuel.org/2007/07/22/page-storecom/comment-page-1#comment-16179</link>
		<dc:creator>Toppy</dc:creator>
		<pubDate>Tue, 15 Jan 2008 07:59:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.csamuel.org/2007/07/22/page-storecom/#comment-16179</guid>
		<description>Our site www.kantelpunten.com is currently being visisted by this mistery page-store. Unfortunately I cannot change the firewall settings. It does not seem to look at any rules, because it is also following links with rel=&quot;nofollow&quot;. Our site is completely dynamic and there are a lot of those links which should not be cached at all (It is not looping but they will be busy for a long while). Currently I am retruning them a page not to crawl us (so it will not find new links anymore). But i&#039;m thinking of making a script which jumps to other generated pages for ever and fill up their sellable database with bogus.

I assume they are not crawling just for fun. So does that mean that somebody wants to buy our content and make a similar site?

Robots/crawler for search engine are welcome. But i want them to follow the robots.txt and other directives otherwise they will be banned from my site.

There is one other thing which I noticed. This crawler will remember cookie information and use it in their requests.

Does anyone know what they are doing with the crawled data? For example what is the selling price?</description>
		<content:encoded><![CDATA[<p>Our site <a href="http://www.kantelpunten.com" rel="nofollow">http://www.kantelpunten.com</a> is currently being visisted by this mistery page-store. Unfortunately I cannot change the firewall settings. It does not seem to look at any rules, because it is also following links with rel=&#8221;nofollow&#8221;. Our site is completely dynamic and there are a lot of those links which should not be cached at all (It is not looping but they will be busy for a long while). Currently I am retruning them a page not to crawl us (so it will not find new links anymore). But i&#8217;m thinking of making a script which jumps to other generated pages for ever and fill up their sellable database with bogus.</p>
<p>I assume they are not crawling just for fun. So does that mean that somebody wants to buy our content and make a similar site?</p>
<p>Robots/crawler for search engine are welcome. But i want them to follow the robots.txt and other directives otherwise they will be banned from my site.</p>
<p>There is one other thing which I noticed. This crawler will remember cookie information and use it in their requests.</p>
<p>Does anyone know what they are doing with the crawled data? For example what is the selling price?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jack</title>
		<link>http://www.csamuel.org/2007/07/22/page-storecom/comment-page-1#comment-16162</link>
		<dc:creator>Jack</dc:creator>
		<pubDate>Sun, 13 Jan 2008 18:41:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.csamuel.org/2007/07/22/page-storecom/#comment-16162</guid>
		<description>when you get one, try this URL

http://XXX.XXX.XXX.XXX:8080/.

replace the XXX with the ip of the crawler. Don&#039;t forget the ending DOT.

Maybe you can shut it down. if you figure out how, please post it here</description>
		<content:encoded><![CDATA[<p>when you get one, try this URL</p>
<p><a href="http://XXX.XXX.XXX.XXX:8080/" rel="nofollow">http://XXX.XXX.XXX.XXX:8080/</a>.</p>
<p>replace the XXX with the ip of the crawler. Don&#8217;t forget the ending DOT.</p>
<p>Maybe you can shut it down. if you figure out how, please post it here</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Phil Tillman</title>
		<link>http://www.csamuel.org/2007/07/22/page-storecom/comment-page-1#comment-8334</link>
		<dc:creator>Phil Tillman</dc:creator>
		<pubDate>Mon, 06 Aug 2007 00:10:39 +0000</pubDate>
		<guid isPermaLink="false">http://www.csamuel.org/2007/07/22/page-storecom/#comment-8334</guid>
		<description>Same story on a site that I manage.  It tried to browse a directory protected by sessions and cookies; my site just directed it to a login page.  

Curious thing though is that it tried to access pages that are not linked to anything and don&#039;t have particularly obvious names.  I wonder how it knew the pages were there?  My logs are not public, but perhaps some of my users have left there tracks somewhere.</description>
		<content:encoded><![CDATA[<p>Same story on a site that I manage.  It tried to browse a directory protected by sessions and cookies; my site just directed it to a login page.  </p>
<p>Curious thing though is that it tried to access pages that are not linked to anything and don&#8217;t have particularly obvious names.  I wonder how it knew the pages were there?  My logs are not public, but perhaps some of my users have left there tracks somewhere.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sugerør &#187; Blog Archive &#187; Who is page-store.com?</title>
		<link>http://www.csamuel.org/2007/07/22/page-storecom/comment-page-1#comment-8080</link>
		<dc:creator>Sugerør &#187; Blog Archive &#187; Who is page-store.com?</dc:creator>
		<pubDate>Fri, 27 Jul 2007 15:34:38 +0000</pubDate>
		<guid isPermaLink="false">http://www.csamuel.org/2007/07/22/page-storecom/#comment-8080</guid>
		<description>[...] I am not the only one who is being visited, it seems &#8230; [...]</description>
		<content:encoded><![CDATA[<p>[...] I am not the only one who is being visited, it seems &#8230; [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Robert Sweetnam</title>
		<link>http://www.csamuel.org/2007/07/22/page-storecom/comment-page-1#comment-8060</link>
		<dc:creator>Robert Sweetnam</dc:creator>
		<pubDate>Thu, 26 Jul 2007 00:12:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.csamuel.org/2007/07/22/page-storecom/#comment-8060</guid>
		<description>As of tonight (26 July 07) I noticed the exact same spider crawling my site for the first time.

It was following every link and ignoring my robots.txt so the easiest remedy was to block the address on my firewall. I also sent an e-mail to amazon but I don&#039;t expect a reply.</description>
		<content:encoded><![CDATA[<p>As of tonight (26 July 07) I noticed the exact same spider crawling my site for the first time.</p>
<p>It was following every link and ignoring my robots.txt so the easiest remedy was to block the address on my firewall. I also sent an e-mail to amazon but I don&#8217;t expect a reply.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: TimC</title>
		<link>http://www.csamuel.org/2007/07/22/page-storecom/comment-page-1#comment-8047</link>
		<dc:creator>TimC</dc:creator>
		<pubDate>Mon, 23 Jul 2007 22:26:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.csamuel.org/2007/07/22/page-storecom/#comment-8047</guid>
		<description>And I notice they don&#039;t say what user agent they honour in robots.txt.

Might have to ban them at the firewall instead.</description>
		<content:encoded><![CDATA[<p>And I notice they don&#8217;t say what user agent they honour in robots.txt.</p>
<p>Might have to ban them at the firewall instead.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
