in the log file, clear record of the amount of each search engine grab, such as love, Shanghai Google, Sogou search engine crawl records, we can record, using the DOS or Linux command to achieve, the search engine is determined by the amount and quality of the crawl to decide, when the quality at the same time, spiders crawl bigger, then included will be more, we were in the log analysis, must know the amount of crawling spider every day in the end is what the situation, and every day to record, perhaps the absolute value can not explain what, we can go to see the trend when one day, grab the volume trend in the fall, we will find the reason.
for the website optimization, search engine log analysis is a necessary, whether you are a small website included hundreds of millions, or included large and medium-sized website, Shanghai dragon to do well, must make scientific analysis of the log, the log is all events in the web server on record including user access records, search engines crawl records for some large sites, the daily log has several G size, we can use the Linux command to separate, in large web log files to be confidential, ordinary people can not see, because you can analyze visitor trends and regional trends from the log inside, we do not need so many Shanghai dragon data, as long as we analysis the search engine to grab a piece of this record can be, So a large amount of data, if after treatment, is not particularly large, and now the hard disk is so cheap, the log files are stored or can be considered. Then we mainly analyze the log data of
1, each search engine (and trend)
what?Grab the overall quantity of
on the two step of the overall grasp, not to repeat the crawl down, then we have to analyze each search engine for each directory to grab what happens, so to block optimization, for example when you increase site traffic, you can know what directory flow >
2, record the search engine spiders don’t repeat grab
step we grab the amount of data to analyze the spider out, then we will go to, is the search engine not only repeat grab quantity, in fact for many pages included, as long as grabbing a can, but in the actual operation of the process when, many pages are repeated grab Google, more advanced technology, repeated crawl rates may be lower, but fell in love with the sea and other search engines, repeat grab rate is very high, you can see through the log analysis, if one day grab the amount of millions, tens of thousands of times are likely to grab the home page, so many of your data must go to the analysis, when you analyze, you will know the seriousness of the problem.
3, each directory, each search engine