Category: 

What Is a Web Crawler?

Web crawlers and other similar technologies use algorithms, complex mathematical equations, which are the keys to producing targeted results in searches.
Article Details
  • Originally Written By: Heather Kaefer
  • Revised By: Bott
  • Edited By: Lucy Oppenheimer
  • Last Modified Date: 25 July 2014
  • Copyright Protected:
    2003-2014
    Conjecture Corporation
  • Print this Article
Free Widgets for your Site/Blog
More bank robberies occur on Friday than any other day of the week.   more...

July 30 ,  1945 :  The USS Indianapolis was torpedoed after dropping off key components of the Hiroshima atomic bomb.  more...

A web crawler is a relatively simple automated program, or script, that methodically scans or "crawls" through Internet pages to create an index of the data it's looking for; these programs are usually made to be used only once, but they can be programmed for long-term usage as well. There are several uses for the program, perhaps the most popular being search engines using it to provide webs surfers with relevant websites. Other users include linguists and market researchers, or anyone trying to search information from the Internet in an organized manner. Alternative names for a web crawler include web spider, web robot, bot, crawler, and automatic indexer. Crawler programs can be purchased on the Internet, or from many companies that sell computer software, and the programs can be downloaded to most computers.

Common Uses

There are various uses for web crawlers, but essentially a web crawler may be used by anyone seeking to collect information out on the Internet. Search engines frequently use web crawlers to collect information about what is available on public web pages. Their primary purpose is to collect data so that when Internet surfers enter a search term on their site, they can quickly provide the surfer with relevant web sites. Linguists may use a web crawler to perform a textual analysis; that is, they may comb the Internet to determine what words are commonly used today. Market researchers may use a web crawler to determine and assess trends in a given market.

Ad

Web crawling is an important method for collecting data on, and keeping up with, the rapidly expanding Internet. A vast number of web pages are continually being added every day, and information is constantly changing. A web crawler is a way for the search engines and other users to regularly ensure that their databases are up-to-date. There are numerous illegal uses of web crawlers as well such as hacking a server for more information than is freely given.

How it Works

When a search engine's web crawler visits a web page, it "reads" the visible text, the hyperlinks, and the content of the various tags used in the site, such as keyword rich meta tags. Using the information gathered from the crawler, a search engine will then determine what the site is about and index the information. The website is then included in the search engine's database and its page ranking process.

Web crawlers may operate one time only, say for a particular one-time project. If its purpose is for something long-term, as is the case with search engines, web crawlers may be programed to comb through the Internet periodically to determine whether there has been any significant changes. If a site is experiencing heavy traffic or technical difficulties, the spider may be programmed to note that and revisit the site again, hopefully after the technical issues have subsided.

Ad

Discuss this Article

vittu
Post 23

what is the link between a webcrawler and a master which assigns map tasks to the mappers?

anon108933
Post 21

I want to set up a website where I want to have the information from various sites mentioned there - more so that my web site becomes a reference point - more akin to online news. Can i do this with a web crawler? if so how?

anon107578
Post 20

If you're interested in web crawling, you should try 80legs. They have free web crawling available, but you can buy some more powerful services for decent prices.

anon106962
Post 19

Yeah, there are several third party web crawlers you can use to crawl sites and gather data. 80legs is a good one - free plan lets you crawl 100,000 pages free and more options avail. Mozenda is pricey (5,000 pages for $99), but it's got a nice user interface tool.

We use these to crawl some sites as part of our business strategy at work. Some techies from our development group got us started with them.

anon104280
Post 18

what are the basic differences in google search and web crawler?

cvpkarthik
Post 17

what is a crawler? Please give me a idea. where is it used? programming?

stellabiz
Post 14

Dhananjay- I'm running a small business and using a web crawler called Mozenda for data gathering and marketing research. It's really simple and not very expensive. I think it can be used for everything  from extensive data mining for corporations to personal use (comparison shopping or researching colleges etc). I'm actually a bit addicted to it.

anon73138
Post 13

does anyone know what blp_bbot is?

anon69510
Post 12

There are some third-party services for web crawling.

anon66011
Post 11

is a web crawler used to download complete sites automatically? and can be read offline? please reply soon. it's urgent.

Dhananjay
Post 9

Which are actual users of Web crawlers other than search engines? What are the uses of the web crawler in day to day Internet surfing?

anon55873
Post 6

How do they index the data? i'm sure one is necessary.

anon52425
Post 5

Very well written. :)

anon49895
Post 4

Depending on whether or not the e-Mail supports HTML formatting, you could always try doing this:

Send Mail

You can alter the subject as well. Just change "hello" to whatever you'd like, and "again" to whatever you like. the %20 represents the code for initiating a 'space'. So if you would like it to say something like: E-mail to the webmaster, it would be subject=E-mail%20to%20the%20webmaster. best of luck.

bettylou
Post 2

I have a webstore. I just learned how to do a signature in my e-mails so my webstore is at the bottom. But it is not blue in color like most e-mail links are. Can someone tell me what I need to do, so people can just click on the signature and get to the website?

Thank you, Betty

Post your comments

Post Anonymously

Login

username
password
forgot password?

Register

username
password
confirm
email