There have been signs over the weekend that point to the crawling of JavaScript from Live Search’s bot, MSNBOT-MEDIA. The initial evidence was found as URLs for site feedback were getting crawled, and references to these URLs only existed from JavaScript files. The consequent feedback pages were also only accessible through JavaScript as well.
Upon further research, it seems that MSNBOT-MEDIA has its sights set on JavaScript / AJAX (Asynchronous JavaScript and XML) because of the growing number of media content getting throw on the web. Image galleries, Flash files, and AJAX elements are gaining ground in some categories, and it’s logical that bots begin crawling JavaScript.
A couple of concepts that back this theory:
1) The bot is called “MSNBOT-MEDIA” after all, so it would make sense that it’s intended to crawl media elements for Microsoft’s Live Search.
2) A lot of media is “hidden” behind JavaScript / AJAX / Flash, so finding a way to index this media should require bots to comprehend JavaScript / AJAX / Flash.
Hiding Content
For those of you who were trying to hide content from being crawled and indexed, using JavaScript may no longer be the modus operandi. It might be time to open up the old robots.txt file and make some changes.




