PHP is incredibly powerful and allows you to do some really cool stuff like crawl all links on a website of your choice. Once you have this data you can do whatever you’d like with it such as save it to a database or manipulate it to suit your needs.
Simple PHP Link Crawler Demo
Try it out! Input the url of the website you’d like to crawl using the form below:
Building a Simple PHP Crawler
The code below will out all the hyperlinks on the target url. To use it simple create a new PHP document and save to your server.
You can see from this code that creating a really powerful spider with PHP cURL functions isn’t that hard to do.
Let’s Discuss More about this Link Spider
Some would hesitate to call this an actual spider since it’s only crawling one specified page but I beg to differ. This is the makings of a basic search spider, it just needs some additional automation and AI. You can easily use this code as a starting point for a more complex PHP crawler.
The cURL PHP Function Library
cURL is PHP’s the “client URL function library”. Meaning, it’s the set of functions that allow you query remote servers. It’s your first step to creating a PHP-based Search Engine, robot or link/keyword checker. The library allows you to connect and communicate to various types of servers running on different types of protocols.
cURL and Regular Expressions
Using a loop and regular expressions allows you to really fine-tune the spider to pull specific on-page elements like images, videos and link as seen in this demo. It’s actually possible to develop your spider to learn from it’s mistakes using regular expressions.
Some Sites Don’t Work?
Most likely the sites you are experiencing crawling difficulties on are blocking the access to their protocol and therefore not returning any data. These are typically larger sites like Facebook and Google.
Why are there blank lines?
Those are the links returned without any text. Perhaps they are wrapping an image or used for some other purpose.
Expanding on this Functionality
The limits of this code are endless and depend on your ingenuity. Know what your goal is and strive to develop an App that meets that objective efficiently. Use the community as your resource and never stop pushing the limits of what you know already.
Have a question? Confused? Leave a comment below and don’t forget to “Like” WordImpress on Facebook! Hope you enjoyed this article.