Gast
Unregistrierter Benutzer
|
Nach Update 2.4.10 => 2.4.11: Kein Login mehr möglich
Geschrieben: 23.03.2008 19:30
Ist ein nettes Tool:
GSiteCrawler Features
In general, the GSiteCrawler will take a listing of your websites URLs, let you edit the settings and generate Google Sitemap files. However, the GSiteCrawler is very flexible and allows you to do a whole lot more than "just" that!
Capture URLs for your site using
* a normal website crawl - emulating a Googlebot, looking for all links and pages within your website
* an import of an existing Google Sitemap file
* an import of a server log file
* an import of any text file with URLs in it
The Crawler
* does a text-based crawl of each page, even finding URLs in javascript
* respects your robots.txt file
* respects robots meta tags for index / follow
* can run up to 15 times in parallel
* can be throttled with a user defined wait-time between URLs
* can be controlled with filters, bans, automatic URL modifications
With each page, it
* checks date (from the server of using a date meta-tag) and size of the page
* checks title, description and keyword tags
* keeps track of the time required to download and crawl the page
Once the pages are in the database, you can
* modify Google Sitemap settings like "priority" and "change frequency"
* search for pages by URL parts, title, description or keywords tags
* filter pages based on custom criteria - adjust their settings globally
* edit, add and delete pages manually
And you have everything the way you want it, you can export it as
* a Google Sitemap file in XML format (of course ) - with or without the optional attributes like "change date", "priority" or "change frequency"
* a text URL listing for other programs (or for use as a UrlList for Yahoo!)
* a simple RSS feed
* Excel / CSV files with URLs, settings and attributes like title, description, keywords
* a Google Base Bulk-Import file
* a ROR (Resources of Resources) XML file
* a static HTML sitemap file (with relative or absolute paths)
* a new robots.txt file based on your chosen filters
* ... or almost any type of file you want - the export function uses a user-adjustable text-based template-system
For more information, it also generates
* a general site overview with the number of URLs (total, crawlable, still in queue), oldest URLs, etc
* a listing of all broken URLs linked in your site (or otherwise not-accessable URLs from the crawl)
* an overview of your sites speed with the largest pages, slowest pages by total download time or download speed (unusually server-intensive pages), and those with the most processing time (many links)
* an overview of URLs leading to "duplicate content" - with the option of automatically disabling those pages for the Google Sitemap file
Additionally ...
* It can run on just about any Windows version from Windows 95b on up (tested on Windows Vista beta 1 and all server versions).
* It can use local MS-Access databases for re-use with other tools
* It can also use SQL-Server or MSDE databases for larger sites (requires a seperate installation file).
* It can be run in a network environment, splitting crawlers over multiple computers - sharing the same database (for both Access and SQL-Server).
* It can be run automated, either locally on the server or on a remote workstation with automatic FTP upload of the sitemap file.
* It tests for and recognizes non-standard file-not-found pages (without HTTP result code 404).
... and much more! (if you can't find it here, send me a note and I'll either add it or show you how you can do it!)
Im Endeffekt wird damit eine vollständige sitemap.xml erzeugt, die dann bei Google bzw. eine urllist.txt bei Yahoo eingereicht werden kann.
Nach derzeitiger Hochrechnung werden mit allen Medien, Artikel,... ca. 23.000 Einträge erzeugt, die bei Google dann indiziert werden können.
Aktuell sind für meine Seite gerade mal 72 Seiten indiziert...
Ciao,
Boby [addsig]
|