Google has long been a habit. Moreover, it does not matter which search engine to use for this. Google is an idea, but how to implement it, the second question.
Whatever search engine a person uses, as a result, he seeks to get the right solution quickly and correctly. In most cases, it is enough to look through the necessary books and find information. But you always want to do everything faster and better.
Classic parsing information
Read books - parse. What does it mean? This is when a person simply understands what is read and evaluates it in relation to the author and the publisher. But this is a very effective process, although long and laborious.
Itโs much more efficient to use Internet search engines: quickly and a lot of information. There is a choice.
However, an internet search:
- does not guarantee the freshness of the result;
- does not guarantee the authorโs authority;
- without a publisher, editor, or at least one censor written.
But searching the Internet is quick and voluminous - there is a choice. And if the sample is large, then a generalization of the result gives the necessary guarantees.
You can parse it in PHP and then you can automatically evaluate the freshness of each element of the sample, but a rare search engine does not check the visitor for a robot and without fail requires a captcha or otherwise tries to confirm the visit by a person, not a robot or a spider.
Internet parsing
There are websites and search engines on the Internet. The former provide information, the latter offer information that they themselves have collected by analyzing numerous sites for a long time.
Finding the right sites is not easy for a specific purpose. Using search engines is simple for a person, but not for the task of parsing with a PHP script, an "intelligent" AJAX request, or in any other original way.
Search engines try to work for a person and donโt intend to give the results of many years of work on the development of algorithms for searching and parsing information โfor freeโ.
Not every PHP script can answer the captcha, so the question of how to parse sites actually means: how to create your own search engine. Many reputable search engines are not limited to captcha to check who made the request. There are many simpler ways to spot a robot or spider. The result of the selection will be undesirable for the "seeker" of information.
Goal definition
Information Search - Search for sites or information sources. Book publications and other classical forms of expressing knowledge and experience, confirmed by authoritative authors, editors, publishers, are not parsing, it is a long, convincingly correct process of finding the necessary information.
And in the modern information world, parse - what does it mean? A specific script written by a specific programmer to solve a specific problem solves this problem. The task manager may not assume what this script does and how. But he always knows what and how he wants to find.
In any situation, determining the goal of the customer is the task of the contractor. But the question is not even how fully they will understand each other, the question is how to make high-quality parsing.
A good idea is to set a goal to find information fresh, accurate and objectively reliable. A great idea is to define goal achievement as the correct movement for page tags. HTML is a real medium for presenting information, and it perfectly accurately distinguishes the necessary information from advertising spam.