Screen Scraping with Goutte

We all have been in situations were we need content or information from a connected website, but have no access to a REST Api or any other backend feed.

In these cases screen scraping is the only option to get needed information to finalize an integration. You can do that directly in CURL, but that can be tedious. Far easier to use a nicely packaged solution that combines a component that simulates web browser behavior and a component that eases DOM navigation for HTML and XML documents. Meet Goutte!

STEP 0

Install via composer.

  1. composer require fabpot/goutte

 

STEP 1

Login into a website and navigate to the page that has your needed information

  1. $client = new Client();
  2. $crawler = $client->request('GET', 'https://www.page/login.php');
  3.  
  4. // select the form and fill in some values
  5. $form = $crawler->selectButton('Login')->form();
  6. $form['f_loginname'] = 'HelloMe';
  7. $form['f_loginpass'] = 'securepass';
  8.  
  9. // submit that form
  10. $crawler = $client->submit($form);
  11.  
  12. // go to next page    
  13. $crawler = $client->request('GET', 'https://www.page.de/overview.php');

 

STEP 2

Get the data you need.

  1. // loop over html and filter out what you need
  2.  
  3. $crawler->filter('table.clients tr')->each(function ($node) {
  4.  
  5.   $node->filter('td')->each(function ($sub_node) {
  6.     echo $sub_node->html();
  7.   }
  8. }

Goutte @ Github
BrowserKit Documentation
DOM Crawler Documentation

Enjoy coding …

 

Alex

I am a full-stack developer. I love programming,  design and know my way around server architecture as well.  I would never feel complete, with one of these missing. I have a broad range of interests, that’s why I constantly dive into new technologies and expand my knowledge where ever required. Technologies are evolving fast and I enjoy using the latest. Apart from that, I am a peace loving guy who tries to have people around him that think the same.  I truly believe in the principle: “If you help someone, someone will help you, when you need it."

Recent Posts

B&B / Hotel Booking Solutions for WordPress | 2024

BOOKING SOLUTIONS 202x This is my take on a subset of booking, appointment, PMS or… Read More

4 weeks ago

WordPress Cron + WP-CLI + Ntfy

THE GOAL Create a system cron for WordPress, that is accessible and can be easily… Read More

2 months ago

2024 is here and now :)

2024, what's cooking? Slowly getting into the 2024 spirit. 3 projects coming to a close… Read More

4 months ago

2023 ends and whats next !

Short look back at 2023 This has been a busy and interesting year. I am… Read More

4 months ago

cubicFUSION Grid Tweaker – Elementor Grid made easy.

Elementor Pro provides grid containers as an experimental feature. The options provided are limited, when… Read More

5 months ago

Archaeology Travel Booth – Travel Innovation Summit 2023

Archaeology Travel is an online travel guide for people who enjoy exploring the world’s pasts.… Read More

6 months ago