ScrapingBee – The Best Web Scraping API

Name: ScrapingBee
Brand: ScrapingBee
Rating: 4.9 (38 reviews)

How to handle infinite scroll pages in C#

Mon, 01 Jan 0001 00:00:00 +0000

Nowadays, most websites use different methods and techniques to decrease the load and data served to their clients’ devices. One of these techniques is the infinite scroll.

In this tutorial, we will see how we can scrape infinite scroll web pages using a js_scenario, specifically the scroll_y and scroll_x features. And we will use this page as a demo. Only 9 boxes are loaded when we first open the page, but as soon as we scroll to the end of it, we will load 9 more, and that will keep happening each time we scroll to the bottom of the page.

Adding items to an eCommerce shopping cart

Mon, 01 Jan 0001 00:00:00 +0000

Here is a quick tutorial on how you may add items to a shopping cart on eCommerce websites using ScrapingBee API via a JS scenario on Python.

1. You would need to identify any CSS selector that uniquely identifies the button or 'add to cart' element you wish to click. This can be done via the inspect element option on any browser, more details can be found on this tutorial:
https://www.scrapingbee.com/tutorials/how-to-extract-css-selectors-using-chrome/

Data extraction in C#

Mon, 01 Jan 0001 00:00:00 +0000

One of the most important features of ScrapingBee, is the ability to extract exact data without need to post-process the request’s content using external libraries.

We can use this feature by specifying an additional parameter with the name extract_rules. We specify the label of elements we want to extract, their CSS Selectors and ScrapingBee will do the rest!

Let’s say that we want to extract the title & the subtitle of the data extraction documentation page. Their CSS selectors are h1 and span.text-[20px] respectively. To make sure that they’re the correct ones, you can use the JavaScript function: document.querySelector("CSS_SELECTOR") in that page’s developer tool’s console.

How to extract content from a Shadow DOM

Mon, 01 Jan 0001 00:00:00 +0000

Certain websites may hide all of their page content inside a shadow root, which makes scraping them quite challenging. This is because most scrapers cannot directly access HTML content embedded within a shadow root. Here is a guide on how you can extract such data via ScrapingBee.

We will use a quite popular site as an example: www.msn.com
If you inspect any article on this page, let’s use this one. You can see that all of its contents are inside a shadow root:

How to extract curl requests from Chrome

Mon, 01 Jan 0001 00:00:00 +0000

Open the Network tab in the DevTools
Right click (or Ctrl-click) a request
Click "Copy" → "Copy as cURL"
You can now paste it in the relevant curl converter to translate it in the language you want

How to extract curl requests from Firefox

Mon, 01 Jan 0001 00:00:00 +0000

Open the Network Monitor tab in the Developer Tools
Right click (or Ctrl-click) a request
Click "Copy" → "Copy as cURL"
You can now paste it in the relevant curl converter to translate it in the language you want

How to extract curl requests from Safari

Mon, 01 Jan 0001 00:00:00 +0000

Open the Network tab in the Developer Tools
Right click (or Ctrl-click or two-finger click) a request
Click "Copy as cURL" in the dropdown menu
You can now paste it in the relevant curl converter to translate it in the language you want

How to remove any element from the HTML response

Mon, 01 Jan 0001 00:00:00 +0000

Sometimes you may need to remove specific HTML elements from the page's content, either to get cleaner results for your data extraction rules, or to simply delete unnecessary content from your response.

To achieve that using ScrapingBee, you can use aJavaScript Scenario, with an evaluate instruction and execute this custom JS code:

document.querySelectorAll("ELEMENT-CSS-SELECTOR").forEach(function(e){e.remove();});

For example, to remove all of the <style> elements from the response, you can use this JavaScript Scenario:

Scrolling and loading more content via a JS scenario

Mon, 01 Jan 0001 00:00:00 +0000

Certain websites may require you to scroll in order to load more results on the page or within a specific element.

This is a quick guide on how to achieve different scrolling behaviors using JavaScript scenario.
*Note that the JavaScript Scenario has a maximum execution time limit of 40 seconds. Requests exceeding this limit will result in a timeout: https://www.scrapingbee.com/documentation/js-scenario/#timeout

1. Scrolling a Specific Element

Some page elements, such as tables or graphs, may contain content that only becomes visible after scrolling.

Scrolling via page API

Mon, 01 Jan 0001 00:00:00 +0000

Some pages load more content only after you click “Load more results” or scroll and wait. In reality, the page often fetches additional results from its own API. If ScrapingBee can’t load those results, you can target the site’s API URL directly.

Here’s how to do that using this URL as an example: https://www.reuters.com/technology

*Note that the JavaScript Scenario has a maximum execution time limit of 40 seconds. Requests exceeding this limit will result in a timeout: https://www.scrapingbee.com/documentation/js-scenario/#timeout

Make concurrent requests in C#

Mon, 01 Jan 0001 00:00:00 +0000

Our API is designed to allow you to have multiple concurrent scraping operations. That means you can speed up scraping for hundreds, thousands or even millions of pages per day, depending on your plan.

The more concurrent requests limit you have the more calls you can have active in parallel, and the faster you can scrape.

using System;
using System.IO;
using System.Net;
using System.Web;
using System.Threading;

namespace test {
 class test{

 private static string BASE_URL = "https://app.scrapingbee.com/api/v1/?";
 private static string API_KEY = "YOUR-API-KEY";

 public static string Get(string uri)
 {
 HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
 request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;

 using(HttpWebResponse response = (HttpWebResponse)request.GetResponse())
 using(Stream stream = response.GetResponseStream())
 using(StreamReader reader = new StreamReader(stream))
 {
 return reader.ReadToEnd();
 }
 }

 public static bool Scrape(string uri, string path) {
 Console.WriteLine("Scraping " + uri);
 var query = HttpUtility.ParseQueryString(string.Empty);
 query["api_key"] = API_KEY;
 query["url"] = uri;
 string queryString = query.ToString(); // Transforming the URL queries to string

 string output = Get(BASE_URL+queryString); // Make the request
 try {
 using (StreamWriter sw = File.CreateText(path))
 {
 sw.Write(output);
 }
 return true;
 } catch {return false;}
 }

 public static void Main(string[] args) {
 Thread thread1 = new Thread(() => Scrape("https://scrapingbee.com/blog", "./scrapingbeeBlog.html"));
 Thread thread2 = new Thread(() => Scrape("https://scrapingbee.com/documentation", "./scrapingbeeDocumentation.html"));
 thread1.Start();
 thread2.Start();

 }
 }
}

Make concurrent requests in Go

Mon, 01 Jan 0001 00:00:00 +0000

The more concurrent requests limit you have the more calls you can have active in parallel, and the faster you can scrape.

Making concurrent requests in GoLang is as easy as adding a “go” keyword before our scraping functions! The code below will make two concurrent requests to ScrapingBee’s pages, and save the content in an HTML file.

Make concurrent requests in NodeJS

Mon, 01 Jan 0001 00:00:00 +0000

The more concurrent requests limit you have the more calls you can have active in parallel, and the faster you can scrape.

Making concurrent requests in NodeJS is very straightforward using Cluster module. The code below will make two concurrent requests to ScrapingBee’s pages, and save the content in an HTML file.

Make concurrent requests in PHP

Mon, 01 Jan 0001 00:00:00 +0000

The more concurrent requests limit you have the more calls you can have active in parallel, and the faster you can scrape.

Making concurrent requests in PHP is as easy as creating threads for our scraping functions! The code below will make two concurrent requests to ScrapingBee’s pages and display the HTML content of each page:

Make concurrent requests in Python

Mon, 01 Jan 0001 00:00:00 +0000

The more concurrent requests limit you have the more calls you can have active in parallel, and the faster you can scrape.

import concurrent.futures
import time
from scrapingbee import ScrapingBeeClient # Importing SPB's client
client = ScrapingBeeClient(api_key='YOUR-API-KEY') # Initialize the client with your API Key, and using screenshot_full_page parameter to take a screenshot!

MAX_RETRIES = 5 # Setting the maximum number of retries if we have failed requests to 5.
MAX_THREADS = 4
urls = ["http://scrapingbee.com/blog", "http://reddit.com/"]

def scrape(url):
 for _ in range(MAX_RETRIES):
 response = client.get(url, params={'screenshot': True}) # Scrape!

 if response.ok: # If we get a successful request
 with open("./"+str(time.time())+"screenshot.png", "wb") as f:
 f.write(response.content) # Save the screenshot in the file "screenshot.png"
 break # Then get out of the retry loop
 else: # If we get a failed request, then we continue the loop
 print(response.content)

with concurrent.futures.ThreadPoolExecutor(max_workers=MAX_THREADS) as executor:
 executor.map(scrape, urls)

Make concurrent requests in Ruby

Mon, 01 Jan 0001 00:00:00 +0000

The more concurrent requests limit you have the more calls you can have active in parallel, and the faster you can scrape.

Making concurrent requests in Ruby is as easy as creating threads for our scraping functions! The code below will make two concurrent requests to ScrapingBee’s pages and display the HTML content of each page:

Retry failed requests in C#

Mon, 01 Jan 0001 00:00:00 +0000

For most websites, your first requests will always be successful, however, it’s inevitable that some of them will fail. For these failed requests, the API will return a 500 status code and won’t charge you for the request.

In this case, we can make our code retry to make the requests until we reach a maximum number of retries that we set:

using System;
using System.IO;
using System.Net;
using System.Web;
using System.Collections.Generic;

namespace test {
 class test{

 private static string BASE_URL = @"https://app.scrapingbee.com/api/v1/?";
 private static string API_KEY = "YOUR-API-KEY";

 public static Dictionary<string, dynamic> Get(string uri)
 {
 HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
 request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;

 using(HttpWebResponse response = (HttpWebResponse)request.GetResponse())
 using(Stream stream = response.GetResponseStream())
 using(StreamReader reader = new StreamReader(stream))
 {
 Dictionary<string, dynamic> OutputList = new Dictionary<string, dynamic>();
 OutputList.Add("StatusCode", response.StatusCode);
 OutputList.Add("Response", reader.ReadToEnd());
 return OutputList;
 }
 }

 public static void Main(string[] args) {

 var query = HttpUtility.ParseQueryString(string.Empty);
 query["api_key"] = API_KEY;
 query["url"] = @"https://scrapingbee.com/blog";
 string queryString = query.ToString(); // Transforming the URL queries to string

 const int MAX_RETRIES = 5; // Set the maximum number of retries we're looking to execute

 for (int i = 0; i < MAX_RETRIES; i++) {
 try {

 var output = Get(BASE_URL+queryString); // Make the request
 var StatusCode = output["StatusCode"];
 var content = output["Response"];

 if (StatusCode == HttpStatusCode.OK) { // If the response is 200/OK
 string path = @"./ScrapingBeeBlog.html"; // Output file
 // Create a file to write to.
 using (StreamWriter sw = File.CreateText(path))
 {
 sw.Write(output);
 }
 Console.WriteLine("Done!");
 break;
 } else {
 Console.WriteLine("Failed request; Status code: " + StatusCode);
 Console.WriteLine("Retrying...");
 }

 } catch (Exception ex) {
 Console.WriteLine("An error has occured:" + ex.Message);
 Console.WriteLine("Retrying...");
 }

 }

 }
 }
}

Retry failed requests in Go

Mon, 01 Jan 0001 00:00:00 +0000

In this case, we can make our code retry to make the requests until we reach a maximum number of retries that we set:

package main

import (
 "fmt"
 "io"
 "net/http"
 "os"
)

const API_KEY = "YOUR-API-KEY"
const SCRAPINGBEE_URL = "https://app.scrapingbee.com/api/v1"

func save_page_to_html(target_url string, file_path string) (interface{}, error) { // Using sync.Waitgroup to wait for goroutines to finish

 req, err := http.NewRequest("GET", SCRAPINGBEE_URL, nil)
 if err != nil {
 return nil, fmt.Errorf("Failed to build the request: %s", err)
 }

 q := req.URL.Query()
 q.Add("api_key", API_KEY)
 q.Add("url", target_url)
 req.URL.RawQuery = q.Encode()

 client := &http.Client{}
 resp, err := client.Do(req)
 if err != nil {
 return nil, fmt.Errorf("Failed to request ScrapingBee: %s", err)
 }
 defer resp.Body.Close()

 if resp.StatusCode != http.StatusOK {
 return nil, fmt.Errorf("Error request response with status code %d", resp.StatusCode)
 }

 bodyBytes, err := io.ReadAll(resp.Body)

 file, err := os.Create(file_path)
 if err != nil {
 return nil, fmt.Errorf("Couldn't create the file ", err)
 }

 l, err := file.Write(bodyBytes) // Write content to the file.
 if err != nil {
 file.Close()
 return nil, fmt.Errorf("Couldn't write content to the file ", err)
 }
 err = file.Close()
 if err != nil {
 return nil, fmt.Errorf("Couldn't close the file ", err)
 }

 return l, nil
}

func main() {

 MAX_RETRIES := 5 // Set a maximum number of retries

 target_url := "https://www.scrapingbee.com"

 for i := 0; i < MAX_RETRIES; i++ {
 saved_screenshot, err := save_page_to_html(target_url,"./scrapingbee.html")
 if err != nil {
 fmt.Println(err)
 fmt.Println("Retrying...")
 } else {
 fmt.Println("Done!", saved_screenshot)
 break
 }
 }

}

Retry failed requests in PHP

Mon, 01 Jan 0001 00:00:00 +0000

In this case, we can make our code retry to make the requests until we reach a maximum number of retries that we set:

<?php

// Get cURL resource
$ch = curl_init();

// Set base url & API key
$BASE_URL = "https://app.scrapingbee.com/api/v1/?";
$API_KEY = "YOUR-API-KEY";

// Set max retries:
$MAX_RETRIES = 5;

// Set parameters
$parameters = array(
 'api_key' => $API_KEY,
 'url' => 'https://www.scrapingbee.com' // The URL to scrape
);
// Building the URL query
$query = http_build_query($parameters);

// Set the URL for cURL
curl_setopt($ch, CURLOPT_URL, $BASE_URL.$query);

// Set method
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');

// Return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

for ($i = 0; $i < $MAX_RETRIES; $i++) {

 // Send the request and save response to $response
 $response = curl_exec($ch);

 // Stop if fails
 if (!$response) {
 die('Error: "' . curl_error($ch) . '" - Code: ' . curl_errno($ch));
 }

 $status_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
 echo 'HTTP Status Code: ' . $status_code . PHP_EOL;

 // If it's a successful request (200 or 404 status code):
 if (in_array($status_code, array(200, 404))) {
 echo 'Response Body: ' . $response . PHP_EOL;
 break;
 } else {
 echo 'Retrying...';
 }

}

// Close curl resource to free up system resources
curl_close($ch);
?>

Retry failed requests in Python

Mon, 01 Jan 0001 00:00:00 +0000

In this case, we can make our code retry to make the requests until we reach a maximum number of retries that we set:

from scrapingbee import ScrapingBeeClient # Importing SPB's client
client = ScrapingBeeClient(api_key='YOUR-API-KEY') # Initialize the client with your API Key, and using screenshot_full_page parameter to take a screenshot!

MAX_RETRIES = 5 # Setting the maximum number of retries if we have failed requests to 5.

for _ in range(MAX_RETRIES):
 response = client.get("http://scrapingbee.com/blog", params={'screenshot': True}) # Scrape!

 if response.ok: # If we get a successful request
 with open("./screenshot.png", "wb") as f:
 f.write(response.content) # Save the screenshot in the file "screenshot.png"
 break # Then get out of the retry loop
 else: # If we get a failed request, then we continue the loop
 print(response.content)

Retry failed requests in Ruby

Mon, 01 Jan 0001 00:00:00 +0000

In this case, we can make our code retry to make the requests until we reach a maximum number of retries that we set:

require 'net/http'
require 'net/https'
require 'addressable/uri'

# Classic (GET)
def send_request(user_url)
 uri = Addressable::URI.parse("https://app.scrapingbee.com/api/v1/")
 api_key = "YOUR-API-KEY"
 uri.query_values = {
 'api_key' => api_key,
 'url' => user_url
 }
 uri = URI(uri)

 # Create client
 http = Net::HTTP.new(uri.host, uri.port)
 http.use_ssl = true
 http.verify_mode = OpenSSL::SSL::VERIFY_PEER

 # Create Request
 req = Net::HTTP::Get.new(uri)

 # Fetch Request
 res = http.request(req)

 # Return Response
 return res
rescue StandardError => e
 puts "HTTP Request failed (#{ e.message })"
end

max_retries = 5
for a in 1..max_retries do
 request = send_request("https://scrapingbee.com")
 if not [404, 200].include?(request.code)
 puts "Request failed - Status Code: #{ request.code }"
 puts "Retrying..."
 else
 puts "Successful request - Status Code: #{ request.code }"
 puts request.body
 break
 end
end

Getting started with ScrapingBee and C#

Mon, 01 Jan 0001 00:00:00 +0000

In this tutorial, we will see how you can use ScrapingBee’s API with C#, and use it to scrape web pages. As such, we will cover these topics:

General structure of an API request
Create your first API request.

Let’s get started!

1. General structure of an API request

The general structure of an API request made in C# will always look like this:

using System;
using System.IO;
using System.Net;
using System.Web;
namespace test {
 class test{

 private static string BASE_URL = @"https://app.scrapingbee.com/api/v1/";
 private static string API_KEY = "YOUR-API-KEY";

 public static string Get(string url)
 {
 string uri = BASE_URL + "?api_key=" + API_KEY + "&url=" + url;
 HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
 request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;

 using(HttpWebResponse response = (HttpWebResponse)request.GetResponse())
 using(Stream stream = response.GetResponseStream())
 using(StreamReader reader = new StreamReader(stream))
 {
 return reader.ReadToEnd();
 }
 }
 }
}

And you can do whatever you want with the response variable! For example:

Getting started with ScrapingBee and Go

Mon, 01 Jan 0001 00:00:00 +0000

In this tutorial, we will see how you can use ScrapingBee’s API with GoLang, and use it to scrape web pages. As such, we will cover these topics:

General structure of an API request
Create your first API request.

Let’s get started!

1. General structure of an API request

The general structure of an API request made in Go will always look like this:

package main

import (
 "fmt"
 "io/ioutil"
 "net/http"
 "net/url"
)
func get_request() *http.Response {
 // Create client
 client := &http.Client{}

 my_url := url.QueryEscape("YOUR-URL") // Encoding the URL
 // Create request
 req, err := http.NewRequest("GET", "https://app.scrapingbee.com/api/v1/?api_key=YOUR-API-KEY&url="+my_url, nil) // Create the request the request

 parseFormErr := req.ParseForm()
 if parseFormErr != nil {
 fmt.Println(parseFormErr)
 }

 // Fetch Request
 resp, err := client.Do(req)

 if err != nil {
 fmt.Println("Failure : ", err)
 }

 return resp // Return the response
}

And you can do whatever you want with the response variable! For example:

Getting started with ScrapingBee and PHP

Mon, 01 Jan 0001 00:00:00 +0000

In this tutorial, we will see how you can use ScrapingBee’s API with PHP, and use it to scrape web pages. As such, we will cover these topics:

General structure of an API request
Create your first API request.

Let’s get started!

1. General structure of an API request

The general structure of an API request made in PHP will always look like this:


<?php

// Get cURL resource
$ch = curl_init();

// Set base url & API key
$BASE_URL = "https://app.scrapingbee.com/api/v1/?";
$API_KEY = "YOUR-API-KEY";

// Set parameters
$parameters = array(
 'api_key' => $API_KEY,
 'url' => 'YOUR-URL' // The URL to scrape
);
// Building the URL query
$query = http_build_query($parameters);

// Set the URL for cURL
curl_setopt($ch, CURLOPT_URL, $BASE_URL.$query);

// Set method
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');

// Return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

// Send the request and save response to $response
$response = curl_exec($ch);

// Stop if fails
if (!$response) {
 die('Error: "' . curl_error($ch) . '" - Code: ' . curl_errno($ch));
}

// Do what you want with the response here

// Close curl resource to free up system resources
curl_close($ch);
?>

And you can do whatever you want with the response variable! For example:

Getting started with ScrapingBee and Ruby

Mon, 01 Jan 0001 00:00:00 +0000

In this tutorial, we will see how you can use ScrapingBee’s API with Ruby, and use it to scrape web pages. As such, we will cover these topics:

General structure of an API request
Create your first API request.

Let’s get started!

1. General structure of an API request

The general structure of an API request made in Ruby will always look like this:

require 'net/http'
require 'net/https'

# Classic (GET)
def send_request
 api_key = "YOUR-API-KEY"
 user_url = "YOUR-URL"

 uri = URI('https://app.scrapingbee.com/api/v1/?api_key='+api_key+'&url='+user_url)

 # Create client
 http = Net::HTTP.new(uri.host, uri.port)
 http.use_ssl = true
 http.verify_mode = OpenSSL::SSL::VERIFY_PEER

 # Create Request
 req = Net::HTTP::Get.new(uri)

 # Fetch Request
 res = http.request(req)

 # Return Response
 return res
rescue StandardError => e
 puts "HTTP Request failed (#{ e.message })"
end

And you can do whatever you want with the response variable! For example:

Getting started with ScrapingBee's NodeJS SDK

Mon, 01 Jan 0001 00:00:00 +0000

In this tutorial, we will see how you can integrate ScrapingBee’s API with NodeJS using our Software Development Kit (SDK), and use it to scrape web pages. As such, we will cover these topics:

Install ScrapingBee’s NodeJS SDK
Create your first API request. Let’s get started!

1. Install the SDK

Before using an SDK, we will have to install the SDK. And we can do that using this command: npm install scrapingbee.

Getting started with ScrapingBee's Python SDK

Mon, 01 Jan 0001 00:00:00 +0000

In this tutorial, we will see how you can integrate ScrapingBee’s API with Python using our Software Development Kit (SDK), and use it to scrape web pages. As such, we will cover these topics:

Install ScrapingBee’s Python SDK
Create your first API request.

Let's get started!

1. Install the SDK

Before using an SDK, we will have to install the SDK. And we can do that using this command:

pip install scrapingbee

How to find all URLs on a domain's website (multiple methods)

Sun, 23 Nov 2025 00:00:00 +0000

Finding all the URLs on a website is one of the most vital tasks in any web-scraping workflow. In this tutorial, we'll walk through multiple ways to find all URLs on a domain: from using Google search tricks, to exploring pro-level SEO tools like ScreamingFrog, and even crafting a Python script to pull URLs at scale from a sitemap. Don't worry, we've got you covered on building a clean list of URLs to scrape (and as a bonus, we'll even show you how to grab some data along the way).

The best Python HTTP clients for web scraping

Sat, 15 Nov 2025 00:00:00 +0000

Alright, let's set the stage. When you start looking for the best Python HTTP clients for web scraping, you quickly realize the ecosystem is absolutely overflowing. A quick Github search pulls up more than 1,800 results, which is enough to make anyone go: "bro, what the hell am I even looking at?"

And yeah, choosing the right one depends on your setup more than people admit. Scraping on a single machine? Whole cluster of hungry workers? Keeping things dead simple? Or chasing raw speed like your scraper is training for the Olympics? A tiny web app pinging a microservice once in a while needs a totally different tool than a script hammering endpoints all day long. Add to that the classic concern: "will this library still exist six months from now, or will it vanish like half of my side projects?"

How to download an image with Python?

Fri, 14 Nov 2025 00:00:00 +0000

If you've ever tried to Python download image from URL, you already know the theory looks stupidly simple: call requests.get() and boom — image saved. Except that's not how the real world usually works. Sites block bots, images hide behind JavaScript, redirects go in circles, and bulk downloads crumble if you're not streaming, retrying, or handling files properly.

This guide takes the actually useful route: how to stream images safely, name files without creating a junkyard, avoid duplicates, scale to thousands of downloads, and bring in ScrapingBee when a site decides to get spicy. By the end, you'll have a toolkit that works on real websites, not toy examples.

How to send a POST with Python Requests?

Tue, 11 Nov 2025 00:00:00 +0000

When you're working with APIs or automating web-related tasks, sooner or later you'll need to send data instead of just fetching it. That's where a POST request in Python comes in. It's the basic move for things like logging into a service, submitting a web form, or sending JSON to an API endpoint.

Using the requests library keeps things clean and human-friendly. No browser automation, no Selenium gymnastics, no pretending to click buttons. You just send a POST request in Python, wait for the response, and continue on. It's readable, dependable, and more or less the default way most developers handle HTTP in Python these days.

How to use a proxy with HttpClient in C#

Sun, 09 Nov 2025 00:00:00 +0000

In this article, we'll walk through how to use a C# HttpClient proxy. HttpClient is built into .NET and supports async by default, so it's the standard way to send requests through a proxy.

Developers often use proxies to stay anonymous, avoid IP blocks, or just control where the traffic goes. Whatever your reason, by the end of this article you'll know how to work with both authenticated and unauthenticated proxies in HttpClient.

Web scraping in C#: From basics to production-ready code (2025)

Fri, 31 Oct 2025 10:22:27 +0200

So, you wanna do C# web scraping without losing your sanity? This guide's got you! We'll go from zero to a working scraper that actually does something useful: fetching real HTML, parsing it cleanly, and saving the data to a nice CSV file.

You'll learn how to use HtmlAgilityPack for parsing, CsvHelper for export, and ScrapingBee as your all-in-one backend that handles headless browsers, proxies, and JavaScript. Yeah, all the messy stuff nobody wants to deal with manually.

Dynamic Web Page Scraping With Python

Sun, 26 Oct 2025 00:00:00 +0000

Modern websites love to render content in the browser through dynamic and interactive JavaScript elements. However, because of that, static scrapers and parsers that work so well with Python become ineffective as they miss prices, reviews, and stock states that appear after client-side rendering.

As a necessary addition to reach the desired information, the new iteration of data collection tools tries to capture dynamic web scraping with Python through headless browsers for clicking on JavaScript elements on the site. However, even then, mimicking real user behavior and customizing the connection until it opens access to our data source requires a lot of technical proficiency, even with tools like Selenium or Puppeteer.

Web crawling with Python made easy: From setup to first scrape

Sun, 26 Oct 2025 00:00:00 +0000

Web crawling with Python sounds fancy, but it's really just teaching your computer how to browse the web for you. Instead of clicking links and copying data by hand, you write a script that does it automatically: visiting pages, collecting info, and moving on to the next one.

In this guide, we'll go step by step through the whole process. We'll start from a tiny script using requests and BeautifulSoup, then level up to a scalable crawler built with Scrapy. You'll also see how to clean your data, follow links safely, and use ScrapingBee to handle tricky sites with JavaScript or anti-bot rules.

Guide to Puppeteer Scraping for Efficient Data Extraction

Sat, 25 Oct 2025 00:00:00 +0000

Puppeteer scraping lets you automate real browsers to open tabs, visit desired web pages, and extract public data. But how do you use this Node.js library without prior experience?

In this guide, we will show you how to set up Puppeteer, navigate pages, extract data with $eval/$$eval/XPath, paginate, and export results. You’ll also see where Puppeteer hits limits at scale and how our HTML API unlocks consistent access to protected websites with the ability to rotate IP addresses and bypass anti-bot systems. Stay tuned, and you will have a working Puppeteer scraper in just a few minutes!

Price Scraper With ScrapingBee

Fri, 24 Oct 2025 00:00:00 +0000

Building a multi-functional price scraper is one of the best ways to extract data from competitor platforms and study their pricing strategies. Because most e-commerce businesses use automated connections for competitive analysis, finding a reliable way to access website data and study market trends is one of the best ways to outshine competitors.

However, researching and analyzing data takes a lot of time, so having the best tools for scraping prices provides a big advantage. In this guide, we will show you how to access web data and start scraping websites with our intuitive HTML API. Stick around to build your first price scraping tool in just a few minutes!

API for dummies: Start building your first API today

Thu, 23 Oct 2025 00:00:00 +0000

If you've been hunting for an easy API for dummies guide that finally explains what all the fuss is about, you're in the right place. Ever wondered how your favorite apps and websites manage to talk to each other so smoothly? That's where APIs come in.

API stands for Application Programming Interface, but don't let that technical name scare you off. In plain English, an API is like a bridge that lets different software systems exchange data or use each other's features without needing to know what's happening behind the scenes.

Web Scraping With LangChain & ScrapingBee

Thu, 23 Oct 2025 00:00:00 +0000

Having a Langchain scraper enables developers to build powerful data pipelines that start with real-time data extraction and end with structured outputs for tasks, like embeddings and retrieval-augmented generation (RAG). To accommodate these benefits, our HTML API simplifies the road towards desired public content via JavaScript rendering, anti-bot bypassing, and content cleanup—so LangChain can process the result into usable text.

In this guide, we will cover the steps and integration details that will help us combine LangChain with our Python SDK, combining these two tools in a Python project. Let's get straight to it!

HTML Web Scraping Tutorial

Wed, 22 Oct 2025 00:00:00 +0000

Over the last two decades, HTML scraping has transformed how we approach market research. While the internet continues to reimagine how we extract and analyze information, we have many different ways to scrape HTML, all of which are different in their approach and complexity.

In this tutorial, we will show how to combine the basics of traditional HTML data collection with the powerful extraction capabilities of our scraping API. This approach will help you create a clear and consistent method for automated data extractions. Let's dive in!

Cloudflare Scraper: How to Bypass Cloudflare With ScrapingBee API

Tue, 21 Oct 2025 00:00:00 +0000

Having an effective Cloudflare scraper opens a whole new world of public data that you can extract with automated connections. Because basic scrapers fail to utilize dynamic fingerprinting methods and proxy rotation, they cannot access many protected platforms due to rate limits, IP blocks, and CAPTCHA challenges.

In this guide, we try to help upcoming businesses and freelancers to reliably fetch pages protected by Cloudflare using our beginner-friendly HTML API. Here, we will explain the common JavaScript rendering challenges, device fingerprinting issues, and how our Python SDK resolves them under the hood through the provided API parameters. Follow the steps to build a small, testable proof of concept before scaling.

Automated Web Scraping - Benefits and Tips

Mon, 20 Oct 2025 00:00:00 +0000

Looking for ways to automate web scraping tools to quickly collect public data online? In the data-driven world, manual aggregation methods cannot compete with the speed of automated growth. Manual scraping is way too slow, error-prone, and not scalable.

Automated web scraping solutions remove the need for monotonous and inefficient tasks, allowing our bots and APIs to do what they do best – execute a recurring set of instructions at far greater speeds. In this guide, we will discuss the necessity of automated connections for data extraction and include some actionable tips that will get you started without prior programming knowledge. Let's get to work!

How to master Selenium web scraping in 2025

Sat, 18 Oct 2025 00:00:00 +0000

Selenium web scraping is still one of the most dependable ways to extract data from dynamic, JavaScript-heavy websites. In 2025, it's smoother and faster than ever.

Selenium is a browser automation toolkit with bindings for all major programming languages, including Python, which we'll focus on here. It talks to browsers through the WebDriver protocol, giving you control over Chrome, Firefox, Safari, or even remote setups. Originally built for testing, Selenium has grown into a full automation tool that can click, type, scroll, and extract data just like a real user.

How to parse HTML in Python: A step-by-step guide for beginners

Thu, 16 Oct 2025 00:00:00 +0000

If you've ever tried to pull data from a website (prices, titles, reviews, links, whatever) you've probably hit that wall called how to parse HTML in Python. The web runs on HTML, and turning messy markup into clean, structured data is one of those rites of passage every dev goes through sooner or later.

This guide walks you through the whole thing, step by step: fetching pages, parsing them properly, and doing it in a way that won't make websites hate you. We'll start simple, then jump into a real-world setup using ScrapingBee, which quietly handles the messy stuff like JavaScript rendering, IP rotation, and anti-bot headaches.

Scrapy vs Selenium: Which one to choose

Thu, 09 Oct 2025 00:00:00 +0000

The Scrapy vs Selenium debate has been ongoing in the web scraping community for years. Both tools have carved out their own territories in the world of data extraction and web automation, but choosing between them can feel like picking between a race car and a Swiss Army knife, they’re both excellent, just for different reasons.

If you’ve ever found yourself staring at a website wondering how to extract its data efficiently, you’ve probably encountered these two powerhouses. Scrapy stands as the world’s most popular open-source web scraping framework, while Selenium has established itself as the go-to solution for browser automation and testing. But which one should you reach for when your next project demands results?

BeautifulSoup tutorial: Scraping web pages with Python

Wed, 08 Oct 2025 00:00:00 +0000

The internet is an endless source of data, and for many data-driven tasks, accessing this information is critical. Thus, the demand for web scraping has risen exponentially in recent years, becoming an important tool for data analysts, machine learning developers, and businesses alike. Also, Python has become the most popular programming language for this purpose.

In this detailed tutorial, you'll learn how to access the data using popular libraries such as Requests and Beautiful Soup with CSS selectors.

Mastering the Python curl request: A practical guide for developers

Wed, 08 Oct 2025 00:00:00 +0000

Mastering the Python curl request is one of the fastest ways to turn API docs or browser network calls into working code. Instead of rewriting everything by hand, you can drop curl straight into Python, or translate it into Requests or PycURL for cleaner, long-term projects.

In this guide, we'll show practical ways to run curl in Python, when to use each method (subprocess, PycURL, Requests), and how ScrapingBee improves reliability with proxies and optional JavaScript rendering, so you can ship scrapers that actually work.

Scraping with Nodriver: Step by Step Tutorial with Examples

Wed, 08 Oct 2025 00:00:00 +0000

If you've used Python Selenium for web scraping, you're familiar with its ability to extract data from websites. However, the default webdriver (ChromeDriver) often struggles to bypass anti-bot mechanisms. As a solution, you can use undetected_chromedriver to bypass some of today's most sophisticated anti-bot systems, including those from Cloudflare and Akamai.

However, it's important to note that undetected_chromedriver has limitations against advanced anti-bot systems. This is where Nodriver, its official successor, comes in.

Web Scraping vs API: What’s the Difference?

Wed, 08 Oct 2025 00:00:00 +0000

Ever found yourself staring at a website, desperately wanting to extract all that data, but wondering whether you should build a scraper or get an API? The web scraping vs API debate is one of the most common questions in data extraction. Honestly, it’s a fair question that deserves a proper answer.

Both approaches have their place in the modern data landscape, but understanding the difference between web scraping and API methods can save you time, money, and countless headaches. In this article I'll help find the best approach for you.

Best Language for Web Scraping

Tue, 07 Oct 2025 00:00:00 +0000

Ever stared at a data-rich website and wondered how to pull it out cleanly and fast? To acomplish this mission, you need to pick the best language for web scraping. But the process can feel a bit confusing. Python’s hype, JavaScript’s ubiquity, and a dozen others languages makes it hard to pick the right one.

After years building scrapers, I’ve watched teams burn time by matching the wrong tool to the job. Today’s web is trickier: JavaScript-heavy UIs, dynamic rendering, rate limits, and sophisticated anti-bot systems. Your stack needs to navigate headless browsers, async flows, and resilience, without turning maintenance into a grind.

How to bypass error 1005 'access denied, you have been banned' when scraping

Tue, 07 Oct 2025 00:00:00 +0000

When scraping websites protected by Cloudflare, encountering Error 1005 — "Access Denied, You Have Been Banned" — is a common challenge. This error signifies that your IP address has been blocked, usually due to Cloudflare's security mechanisms that aim to prevent scraping and malicious activities. However, there are various techniques you can use to bypass this error and continue your scraping operations.

In this guide, we'll focus on specific strategies and tools to bypass Cloudflare Error 1005, helping you to scrape websites efficiently without getting blocked.

How To Build a Real Estate Web Scraper

Mon, 06 Oct 2025 00:00:00 +0000

The real estate market moves fast. Property listings appear and disappear within hours, prices fluctuate based on market conditions, and tracking availability across multiple platforms manually becomes an impossible task. For developers, investors, and real estate agents who need to stay ahead of market trends, building a real estate web scraper offers the solution to automate data collection from sites like Redfin, Idealista, or Apartments.com. Instead of spending hours on manual data entry, you can focus on analyzing insights and making informed decisions based on fresh, accurate market data.

How to use undetected_chromedriver (plus working alternatives)

Mon, 06 Oct 2025 00:00:00 +0000

If you've used Python Selenium for web scraping, you're familiar with its ability to extract data from websites. However, the default webdriver (ChromeDriver) often struggles to bypass the anti-bot mechanisms websites use to detect and block scrapers. With undetected_chromedriver, you can bypass some of today's most sophisticated anti-bot mechanisms, including those from Cloudflare, Akamai, and DataDome.

In this blog post, we’ll guide you on how to make your Selenium web scraper less detectable using undetected_chromedriver. We’ll cover its usage with proxies and user agents to enhance its effectiveness and troubleshoot common errors. Furthermore, we’ll discuss the limitations of undetected_chromedriver and suggest better alternatives.

How to Scrape Images from a Website with ScrapingBee

Sun, 05 Oct 2025 00:00:00 +0000

Learning how to scrape images from website sources is a skill that can unlock various benefits. Whether you’re extracting product photos for competitive analysis, building datasets or gathering visual content for machine learning projects you need to know how to scrape.

In this article, I'll walk you through the process of building website image scraper. But don't worry, you won't have code everything from scratch. ScrapingBee’s web scraping API allows automating content collection with minimal technical knowledge. The best part, it has built in technical infrastructure, so you don't need to think about proxies, JavaScript rendering or other difficulties. Let me show exactly how it works.

Top Web Scraping Challenges in 2025

Sun, 05 Oct 2025 00:00:00 +0000

Top web scraping challenges have evolved dramatically from the simple days of parsing static HTML. I’ve been building scrapers for years, and let me tell you – even simple tasks have turned into a complex chess match between developers and websites. From sophisticated CAPTCHAs, to JavaScript, the obstacles continue to multiply.

In this article, I’ll break down the major hurdles you’ll face when scraping data in 2025 and show you how ScrapingBee can help you jump over these barriers without breaking a sweat. Whether you’re dealing with IP blocks, dynamic content, or legal concerns, there’s a solution that doesn’t involve spending weeks building complex infrastructure.

7 Best SERP APIs in 2025

Sat, 04 Oct 2025 00:00:00 +0000

Looking for the best SERP API in 2025? You've come to the right place. In my experience working with various search engine data projects, choosing the right API can make or break your entire operation. Some search scraping APIs can be frustrating, as they often yield inconsistent data. Others extract data so smoothly you’ll wonder how they make web scraping so easy.

The search engine API market has evolved significantly in 2025, with new players entering the field and established providers upgrading their infrastructure. Whether you’re tracking competitor rankings, building local SEO presence, or feeding data into machine learning models, there’s never been more choice – or more confusion about which provider to pick.

Best Web Scraping Services in USA

Sat, 04 Oct 2025 00:00:00 +0000

The best web scraping services in USA had made manual data collection obsolete. Today, web scapers have become essential infrastructure for modern businesses, transforming what used to be weeks of into automated processes that run with a few lines of code.

However, the scraping landscape has evolved dramatically and what worked in 2020 barely scratches the surface today. Modern websites throw everything at automated data collectors: sophisticated JavaScript rendering, complex bot detection systems, CAPTCHAs, and dynamic content that loads through multiple API calls. Reliable web scraping services knows how to adapt.

How to set up Axios proxy: A step-by-step guide for Node.js

Fri, 03 Oct 2025 00:00:00 +0000

If you've ever tried to send requests through a proxy in Node.js, chances are you've searched for how to set up an Axios proxy. Whether you're scraping the web, checking geo-restricted content, or just hiding your real IP, proxies are a common part of the toolkit.

This guide walks through the essentials of using Axios with proxies:

setting up a basic proxy,
adding username/password authentication,
rotating proxies to avoid bans,
working with SOCKS5,
plus a few fixes for common errors.

We'll also cover where a service like ScrapingBee can save you time if you don't want to manage proxies yourself.

Is Web Scraping Legal? Key Insights and Guidelines You Need to Know

Fri, 03 Oct 2025 00:00:00 +0000

Web scraping raises a lot of questions, but “is web scraping legal” is the one I hear the most. The legality of web scraping depends on three critical factors: what data you’re collecting, how you’re collecting it, and where you’re operating. Think of it like driving a car, the act itself isn’t illegal, but speeding, running red lights, or driving without a license can land you in serious trouble.

This guide breaks down the complex world of web scraping legality across different jurisdictions. We’ll explore key laws including privacy regulations, copyright protections, terms of service agreements, and anti-hacking statutes. You’ll also discover ethical best practices that keep your data collection projects on the right side of the law.

Search Engine Scraping Tutorial With ScrapingBee

Fri, 03 Oct 2025 00:00:00 +0000

Search engine scraping has become an essential method for many businesses, digital marketers, and researchers to gather information. It is an excellent data extraction method when you need to analyze a large number of competitor websites. With web scraping, you can extract information on market trends and make informed decisions on pricing strategies using the data extracted from SERPs.

In this tutorial, I’ll show you how to perform search engine scraping safely and efficiently using ScrapingBee’s web data extraction tool. You’ll learn how to extract structured data from major search engines like Google and Bing without worrying about getting blocked, managing proxies, or dealing with CAPTCHAs. Let's dive in!

How to Scrape Baidu: Step-by-Step Guide

Thu, 02 Oct 2025 00:00:00 +0000

Want to learn how to scrape Baidu? As China’s largest search engine, Baidu is an attractive target for web scraping because it is similar to Google in function but tailored for local regulations. For those wanting to tap into China's digital ecosystem, it is the best source of public data that displays relevant, location-based search trends, plus everything you need to conduct market research.

This guide will teach you how to extract information from Baidu HTML code with the most beginner-friendly solution – our Scraping API and Python SDK. Dynamically loaded pages load structure data with the help of JavaScript scripts, while rate-limiting and bot detection tools try to prevent the automated data parsing on the platform.

Python Web Scraping Stock Price With ScrapingBee

Thu, 02 Oct 2025 00:00:00 +0000

Python web scraping stock price techniques have become essential for traders and financial analysts who need near real-time market data analysis without paying thousands for premium API access.

Becoming a pro at scraping stock market data allows you to build a personal investment dashboard for real time stock data monitoring. It also helps you to extract data for market research, or developing a trading algorithm. Whatever you decide to use all the data for, having direct access to stock prices gives you an edge.

How to Scrape Booking.com: Step-by-Step Tutorial

Wed, 01 Oct 2025 00:00:00 +0000

Booking.com is one of the biggest travel platforms, and a go-to choice for millions of users planning their trips and vacations. By accessing the platform using automated tools, we can collect hotel data, including names, ratings, prices, and locations, for research or comparison purposes.

However, the platform’s strict anti-bot systems make direct extractions nearly impossible. Fortunately, our API and implementation of Python tools eliminate these challenges by providing automatic JavaScript execution, proxy rotation, and CAPTCHA-resistant browsing.

Web Scraping without getting blocked (2025 Solutions)

Wed, 01 Oct 2025 00:00:00 +0000

Web scraping, or crawling, is the process of fetching data from a third-party website by downloading and parsing the HTML code to extract the data you need.

Not every website offers an API, and those that do might not expose all the information you need. Therefore, scraping often becomes the only viable solution to extract website data.

There are numerous use cases for web scraping:

7 Best Web Scraping Tools Ranked (2025)

Tue, 30 Sep 2025 00:00:00 +0000

If you're looking for the best web scraping tools in 2025, you'll quickly see there are a lot of choices. Some are simple libraries, others are full SaaS platforms. Each promises speed, scale, or AI magic, but not every tool will fit your project.

That's why we put together this ranked list. Below, you'll find the top web scraping tools of 2025, with clear breakdowns of features, pros, cons, and pricing. Whether you want a reliable service like ScrapingBee or a free open-source option, you'll see what works best for your needs.

Web Scraping With Linux And Bash

Mon, 29 Sep 2025 00:00:00 +0000

Please brace yourselves, we'll be going deep into the world of Unix command lines and shells today, as we are finding out more about how to use the Bash for scraping websites.

Let's fasten our seatbelts and jump right in 🏁

Why Scraping With Bash?

If you happened to have already read a few of our other articles (e.g. web scraping in Python or using Chrome from Java), you'll be probably already familiar with the level of convenience those high-level languages provide when it comes to crawling and scraping the web. And, while there are plenty of examples of full-fledged applications written in Bash (e.g. an entire web CMS, an Intel assembler, a TLS validator, a full web server), probably few people will argue that Bash scripts are the most ideal environment for large, complex programs. So the question why somebody would suddenly use Bash, is not completely out of the blue and may be a justified question.

How to bypass reCAPTCHA & hCaptcha when web scraping

Sun, 28 Sep 2025 00:00:00 +0000

Introduction

CAPTCHA - Completely Automated Public Turing test to tell Computers and Humans Apart! All these little tasks and riddles you need to solve before a site lets you proceed to the actual content.

How to scrape all text from a website for LLM training

Sat, 27 Sep 2025 00:00:00 +0000

Artificial Intelligence (AI) is rapidly becoming a part of everyday life, and with it, the demand for training custom models has increased. Many people these days would like to train their very own... AI, not dragon, duh! One crucial step in training any language model (LLM) is gathering a significant amount of text data. In this article, I'll show you how to collect text data from all pages of a website using web scraping techniques. We'll build a custom Python script to automate this process, making it easy to gather the data you need for your model training.

How to scrape data from a website to Excel

Fri, 26 Sep 2025 00:00:00 +0000

Collecting data from websites and organizing it into a structured format like Excel can be super handy. Maybe you're building reports, doing research, or just want a neat spreadsheet with all the info you need. But copying and pasting manually? That's a time sink no one enjoys. In this guide, we'll discuss a few ways to scrape data from websites and save it directly into Excel.

Together we'll talk about methods for both non-techies and devs, using everything from built-in Excel tools to coding your own solutions with Python. By the end, you'll have a clear picture of which method fits your needs the best.

Scrapegraph AI Tutorial; Scrape websites easily with LLaMA AI

Thu, 25 Sep 2025 00:00:00 +0000

Artifical intelligence is everywhere in tech these days, and it's wild how it's become a go-to tool, for example, in stuff like web scraping. Let's dive into how Scrapegraph AI can totally simplify your scraping game. Just tell it what you need in simple English, and watch it work its magic.

I'm going to show you how to get Scrapegraph AI up and running, how to set up a language model, how to process JSON, scrape websites, use different AI models, and even turning your data into audio. Sounds like a lot, but it's easier than you think, and I'll walk you through it step by step.

Python wget: Automate file downloads with 3 simple commands

Wed, 24 Sep 2025 09:10:00 +0000

If you've ever needed to grab files in bulk, you know the pain of clicking download links one by one. That's where combining Python and wget shines. Instead of re-implementing HTTP requests yourself, you can call the battle-tested wget tool straight from a Python script and let it handle the heavy lifting.

In this guide, we'll set up wget, explain how to run it from Python using subprocess, and walk through three copy-paste commands that cover almost everything you'll ever need: downloading a file, saving it with a custom name or folder, and resuming interrupted transfers. Let's get started!

How to use a proxy with Python Requests?

Tue, 23 Sep 2025 00:00:00 +0000

If you've ever messed around with scraping or automating requests in Python, you've probably run into the usual roadblocks. One minute everything's smooth, the next you're getting captchas, random 403 errors, or just radio silence from the site. That's usually the internet's polite way of saying: "Hey buddy, slow down." This is where proxies save the day. Setting up a Python Requests proxy, you can mask your real IP, spread your traffic over different addresses, and even slip past geo-restrictions that would normally block you.

Using CSS Selectors for Web Scraping

Mon, 22 Sep 2025 00:00:00 +0000

In today's article we are going to take a closer look at CSS selectors, where they originated from, and how they can help you in extracting data when scraping the web.

Practical XPath for Web Scraping

Sun, 21 Sep 2025 00:00:00 +0000

XPath is a technology that uses path expressions to select nodes or node-sets in an XML document (or in our case an HTML document). Even if XPath is not a programming language in itself, it allows you to write an expression which can directly point to a specific HTML element, or even tag attribute, without the need to manually iterate over any element lists.

It looks like the perfect tool for web scraping right? At ScrapingBee we love XPath! ❤️

Getting Started with chromedp

Sat, 20 Sep 2025 00:00:00 +0000

chromedp is a Go library for interacting with a headless Chrome or Chromium browser.

The chromedp package provides an API that makes controlling Chrome and Chromium browsers simple and expressive, allowing you to automate interactions with websites such as navigating to pages, filling out forms, clicking elements, and extracting data. It's useful for simplifying web scraping as well as testing, performance monitoring, and developing browser extensions.

This article provides an overview of chromedp's advanced features and shows you how to use it for web scraping.

Playwright for Python Web Scraping Tutorial with Examples

Fri, 19 Sep 2025 00:00:00 +0000

Web scraping is a powerful tool for gathering data from websites, and Playwright is one of the best tools out there to get the job done. In this tutorial, I'll walk you through how to scrape with Playwright for Python. We'll start with the basics and gradually move to more advanced techniques, ensuring you have a solid grasp of the entire process. Whether you're new to web scraping or looking to refine your skills, this guide will help you use Playwright for Python effectively to extract data from the web.

Generating Random IPs to Use for Scraping

Thu, 18 Sep 2025 00:00:00 +0000

Web scraping uses automated software tools or scripts to extract and parse data from websites into structured formats for storage or processing. Many data-driven initiatives—including business intelligence, sentiment analysis, and predictive analytics—rely on web scraping as a method for gathering information.

However, some websites have implemented anti-scraping measures as a precaution against the misuse of content and breaches of privacy. One such measure is IP blocking, where IPs with known bot patterns or activities are automatically blocked. Another tactic is rate limiting, which restricts the volume of requests that a single IP address can make within a specific time frame.

Using jQuery to Parse HTML and Extract Data

Wed, 17 Sep 2025 00:00:00 +0000

Your web page may sometimes need to use information from other web pages that do not provide an API. For instance, you may need to fetch stock price information from a web page in real time and display it in a widget of your web page. However, some of the stock price aggregation websites don’t provide APIs.

In such cases, you need to retrieve the source HTML of the web page and manually find the information you need. This process of retrieving and manually parsing HTML to find specific information is known as web scraping.

Getting Started with Apache Nutch

Tue, 16 Sep 2025 00:00:00 +0000

Web crawling is often confused with web scraping, which is simply extracting specific data from web pages. A web crawler is an automated program that helps you find and catalog relevant data sources.

Typically, a crawler first makes requests to a list of known web addresses and, from their content, identifies other relevant links. It adds these new URLs to a queue, iteratively takes them out, and repeats the process until the queue is empty. The crawler stores the extracted data—like web page content, meta tags, and links—in a database.

Web Scraping with Visual Basic

Mon, 15 Sep 2025 00:00:00 +0000

In this tutorial, you will learn how to learn how to scrape websites using Visual Basic.

Don't worry—you won't be using any actual scrapers or metal tools. You'll just be using some good old-fashioned code. But you might be surprised at just how messy code can get when you're dealing with web scraping!

You will start by scraping a static HTML page with an HTTP client library and parsing the result with an HTML parsing library. Then, you will move on to scraping dynamic websites using Puppeteer, a headless browser library. The tutorial also covers basic web scraping techniques, such as using CSS selectors to extract data from HTML pages.

Web Scraping with Html Agility Pack

Sun, 14 Sep 2025 00:00:00 +0000

For any project that pulls content from the web in C# and parses it to a usable format, you will most likely find the HTML Agility Pack. The Agility Pack is standard for parsing HTML content in C#, because it has several methods and properties that conveniently work with the DOM. Instead of writing your own parsing engine, the HTML Agility Pack has everything you need to find specific DOM elements, traverse through child and parent nodes, and retrieve text and properties (e.g., HREF links) within specified elements.

How to Scrape TikTok: Scrape Profile Stats and Videos

Sat, 13 Sep 2025 00:00:00 +0000

Are you a data analyst thirsty for social media insights and trends? A Python developer looking for a practical social media scraping project? Maybe you're a social media manager tracking metrics or a content creator wanting to download and analyze your TikTok data? If any of these describe you, you're in the right place!

TikTok, the social media juggernaut, has taken the world by storm. TikTok's global success is reflected in its numbers:

Getting Started with RSelenium

Fri, 12 Sep 2025 00:00:00 +0000

The value of unstructured data has never been more prominent than with the recent breakthrough of large language models such as ChatGPT and Google Bard. Your organization can also capitalize on this success by building your own expert models. And what better way to collect droves of unstructured data than by scraping it?

This article outlines how to scrape the web using R and a package known as RSelenium. RSelenium is a binding for the Selenium WebDriver, a popular web scraping tool with unmatched versatility. Selenium's interaction capabilities let you manipulate a web page before scraping its contents. This makes it one of the most popular web scraping frameworks.

Web Scraping in Golang Tutorial With Quick Start Examples

Thu, 11 Sep 2025 00:00:00 +0000

In this article, you will learn how to create a simple web scraper using Go.

Robert Griesemer, Rob Pike, and Ken Thompson created the Golang programming language at Google, and it has been in the market since 2009. Go, also known as Golang, has many brilliant features. Getting started with Go is fast and straightforward. As a result, this comparatively newer language is gaining a lot of attraction in the developer world.

Web Scraping with node-fetch

Wed, 10 Sep 2025 00:00:00 +0000

The introduction of the Fetch API changed how Javascript developers make HTTP calls. This means that developers no longer have to download third-party packages just to make an HTTP request. While that is great news for frontend developers, as fetch can only be used in the browser, backend developers still had to rely on different third-party packages. Until node-fetch came along, which aimed to provide the same fetch API that browsers support. In this article, we will take a look at how node-fetch can be used to help you scrape the web!

HTML Parsing in Java with JSoup

Tue, 09 Sep 2025 00:00:00 +0000

It's a fine Sunday morning, and suddenly an idea for your next big project hits you: "How about I take the data provided by company X and build a frontend for it?" You jump into coding and realize that company X doesn't provide an API for their data. Their website is the only source for their data.

It's time to resort to good old web scraping, the automated process to parse and extract data from the HTML source code of a website.

How to extract data from a website? Ultimate guide to pull data from any website

Mon, 08 Sep 2025 00:00:00 +0000

The web is becoming an incredible data source. There are more and more data available online, from user-generated content on social media and forums, E-commerce websites, real-estate websites or media outlets... Many businesses are built on this web data, or highly depend on it.

Manually extracting data from a website and copy/pasting it to a spreadsheet is an error-prone and time consuming process. If you need to scrape millions of pages, it's not possible to do it manually, so you should automate it.

How to Scrape Amazon Prices with ScrapingBee

Sun, 07 Sep 2025 00:00:00 +0000

Learning how to scrape Amazon prices is a great way to access real-time product data for market research, competitor analysis, and price tracking. However, as the biggest retailer in the world, Amazon imposes many scraping restrictions to keep automated connections away from its sensitive price intelligence.

The Amazon page uses dynamic JavaScript elements, aggressive anti-bot systems, and geo-based restrictions that make it difficult to extract price data. This tutorial will show you how to extract Amazon product prices with Python and our powerful API, because not every web scraper can handle data from Amazon.

How to Scrape Yahoo: Step-by-Step Tutorial

Sat, 06 Sep 2025 00:00:00 +0000

Scraping Yahoo search results and finance data is a powerful way to collect real-time insights on market trends, stock performance, and company profiles. With ScrapingBee, you can extract this information easily — even from JavaScript-heavy pages that typically block traditional scrapers.

Yahoo’s dynamic content and anti-bot protections make it difficult to scrape using basic tools. But ScrapingBee handles these challenges out of the box. Our API automatically renders JavaScript, rotates proxies, and bypasses bot detection to deliver clean, structured data from both Yahoo Search and Yahoo Finance.

How to Scrape Yellow Pages with ScrapingBee

Fri, 05 Sep 2025 00:00:00 +0000

Learning how to scrape Yellow Pages can unlock access to a rich database of business listings. With minimal technical knowledge, our approach to scraping HTML content extracts data that you can use for lead generation, market research, or local SEO.

Like most online platforms rich with useful coding data, Yellow Pages present JavaScript-rendered content and anti-scraping measures, which often stop traditional scraping efforts. Our HTML API is built to export data while automatically handling restrictions by loading dynamic content and implementing smart proxy rotation to ensure consistent access with minimal coding skills.

How to Scrape Google Finance Using Python and ScrapingBee

Thu, 04 Sep 2025 00:00:00 +0000

Learning how to scrape Google Finance gives you access to real-time stock prices, company performance metrics, and other financial metrics. However, scraping stock information isn’t always simple, especially on platforms that receive so much traffic. Other issues lie in loading dynamic JavaScript elements, frequent layout changes, and IP restrictions, which make it difficult for automated scrapers to extract consistent data.

This tutorial will teach you how to build a Google Finance scraper from scratch using Python and our versatile HTML API. We’ll cover everything – from setting up your environment to writing code that automatically handles JavaScript and any connection restrictions.

How to Scrape Pinterest: Full Tutorial with ScrapingBee

Wed, 03 Sep 2025 00:00:00 +0000

In this tutorial, I’ll show you how to scrape Pinterest using ScrapingBee’s API. Whether you want to scrape Pinterest data for trending images, individual pins, Pinterest profiles, or entire boards, this guide explains how to build a web scraper that works.

Scraping Pinterest can be tough. Its anti-bot protection often trips up typical web scrapers. That's why I prefer using ScrapingBee. With this tool, you won't need to run a headless browser or wait for page elements to load manually. You just plug in your API key, decide what data to collect, and extract Pinterest data with ease.

How to Scrape Glassdoor: Job Titles, Salaries, and Company Ratings

Tue, 02 Sep 2025 00:00:00 +0000

Trying to learn how to scrape Glassdoor data? You're at the right place. In this guide, I’ll show you exactly how to extract job title descriptions, salaries, and company information using ScrapingBee’s powerful API.

You may already know this – Glassdoor is a goldmine of information, but scraping it can be a challenging task. The site utilizes dynamic content loading and sophisticated bot protection. As a result, the Glassdoor website is out of reach for an average web scraper. I’ve spent countless hours battling these defenses with custom solutions with no luck.

How to Scrape Bing with ScrapingBee: Step-by-Step Guide

Mon, 01 Sep 2025 00:00:00 +0000

Learning how to scrape Bing search results can feel like navigating a minefield of anti-bot measures and IP blocks. Microsoft's Bing search engine has sophisticated protection systems to detect traditional scraping attempts faster than you can debug your first request failure.

That’s exactly why I use ScrapingBee. Instead of wrestling with proxy rotations, JavaScript rendering, and constantly changing anti-bot methods, this web scraper handles all the complexity. It allows you to scrape search results data without any technical issues.

How to Scrape TripAdvisor: Step-by-Step with ScrapingBee

Sun, 31 Aug 2025 00:00:00 +0000

Want to learn how to scrape TripAdvisor? Tired of overpaying for your trips? As one of the biggest online travel platforms, it has tons of valuable information that can help you save money and enjoy your time abroad.

Scraping TripAdvisor is a great way to keep an eye on price changes, customer sentiment, and other details that can impact your trips and vacations. In this tutorial, we will explain how to extract hotel names, prices, ratings, and reviews from TripAdvisor using our web scraping API with Python.

How to Scrape IMDb: Step-by-Step with ScrapingBee

Sat, 30 Aug 2025 00:00:00 +0000

If you want to learn how to scrape IMDb data, you’re in the right place. This step-by-step tutorial shows you how to extract data, including movie details, ratings, actors, and review dates, using a Python script. You’ll see how to set up the required libraries, process the HTML content, and store your results in a CSV file for further analysis using ScrapingBee’s API.

Why ScrapingBee? Here's the thing – if you want to scrape IMDb data, you need an infrastructure of proxies, JavaScript rendering, and other tools to avoid IP blocks. Scraping this website is particularly challenging due to its strict anti-scraping measures, with no exceptions. But setting up everything manually costs time and resources.

How to Scrape Etsy: Step-by-Step Guide

Fri, 29 Aug 2025 00:00:00 +0000

In this guide, I'll teach you how to scrape Etsy, one of the most popular marketplaces for handmade and vintage items. If you've ever tried scraping Etsy before, you know it's not exactly a walk in the park. The website's anti-bot protections, such as CAPTCHA, IP address flagging, and constant updates, make web scraping Etsy product data a challenge.

That’s why ScrapingBee's Etsy scraper is the best tool to get the job done. It's a reliable web scraper that helps you capture real-time data from Etsy listings. It's built to handle all complex parts with JavaScript rendering and proxy rotation. With our API at hand, you can focus on extracting the data you need: Etsy product titles, prices, shop names, and more.

Create a sitemap link extractor using ScrapingBee in N8N

Thu, 28 Aug 2025 00:00:00 +0000

I want to scrape a website, but wait, how do I get the links?

Good question! That's exactly what we are going to answer in this blog post.

While there are multiple options for this, we are going with an easy route, that is, Extracting links from sitemap!

Most websites on the internet provides all of their links in a sitemap.xml or similar file. The reason they create this is to make it easier for search engines to find the website links.

How to Scrape Indeed Job Listings with ScrapingBee

Thu, 28 Aug 2025 00:00:00 +0000

In this guide, we'll dive into how to scrape Indeed job listings without getting blocked. The first time I tried to extract job data from this website, it was tricky. I thought a simple requests.get() would do the trick, but within minutes I was staring at a CAPTCHA wall. That’s when I realized I needed a proper web scraper with proxy rotation and headers baked in to scrape job listing data.

How to Scrape Wikipedia with ScrapingBee

Wed, 27 Aug 2025 00:00:00 +0000

Ever wanted to extract valuable insights and data from largest encyclopedias online? Then it is it to learn how to scrape Wikipedia pages! As one of the biggest treasuries of structured content, it is constantly reviewed and fact-checked by fellow users, or at least provide valuable insights and links to sources.

Wikipedia has structured content but scraping can be tricky due to rate limiting, which restricts repeated connection requests to websites. Fortunately, our powerful tools can overcome these hurdles, ensuring efficient data extraction in a clean HTML or JSON format.

How to Scrape Craigslist: Step-by-Step Tutorial

Tue, 26 Aug 2025 00:00:00 +0000

Have you ever tried learning how to scrape Craigslist and run into a wall of CAPTCHAs and IP blocks? Trust me, my first web scraping attempt was just as rocky.

Craigslist is a gold mine of data. It contains everything from job ads, housing, items for sale, to various services. But it's not an easy nut to crack for beginners in scraping.

Just like in any other web scraping project, you won't get anywhere without proxy rotation, JavaScript rendering, and solving CAPTCHAs. Fortunately, ScrapingBee handles all of it on autopilot. I think of it as an automated scraping assistant that handles all the technicalities.

How to Scrape Google Images: A Step-by-Step Guide

Mon, 25 Aug 2025 00:00:00 +0000

Welcome to a guide on how to scrape Google images. We’ll dive into the exact process of extracting image URLs, titles, and source links from Google Images search results. By the end of this guide, you'll be able to get all the image data from multiple search pages.

Here's the catch, though: to scrape data, you'll need a reliable tool, such as ScrapingBee. Since Google Images implements strong anti-scraping measures, you won't be able to get images without a strong infrastructure.

How to Scrape Google Flights with Python and ScrapingBee

Sun, 24 Aug 2025 00:00:00 +0000

As the the key source of information on the internet, Google contains a lot of valuable public data. Just like with most industries, for many, it is the main source for tracking flight prices plus departure and arrival locations for trips.

As you already know, automation plays a vital role here, as everyone wants an optimal setup to compare multiple airlines and their pricing strategies to save money. Even better, collecting data with your own Google Flights scraper saves a lot of time and provides a consistent access to new deals.

How to Scrape Costco: Complete Step-by-Step Guide

Sat, 23 Aug 2025 00:00:00 +0000

Learning how to scrape Costco can be incredibly valuable for gathering product information, monitoring prices, or conducting market research. In my experience, while there are several approaches to utilize coding tools for scraping Costco's website, our robust HTML API offers the most straightforward solution that handles JavaScript rendering, proxy rotation, and other key elements that tend to overcomplicate data extraction.

In this guide, we will cover how you can extract data from retailers like Costco without getting blocked, dealing with JavaScript rendering, or managing proxies. Let's take a closer look at how you can use our powerful ScrapingBee HTML API with minimal coding knowledge and extract Costco's product data

How to Scrape eBay: Step-by-Step Guide

Sat, 23 Aug 2025 00:00:00 +0000

Learning how to scrape data from eBay efficiently requires the right tools and techniques. eBay’s complex structure and anti-scraping measures make it challenging to extract data reliably.

In this guide, I’ll walk you through the entire process of setting up and running an eBay scraper that actually works. Whether you’re tracking prices, researching products, or gathering seller data, you’ll discover how to extract the information you need without getting blocked

How to Scrape Expedia: Step-by-Step Guide

Sat, 23 Aug 2025 00:00:00 +0000

Expedia scraping is a great strategy for tracking of hotel prices, travel trends, and comparison of deals with real-time data. It’s especially useful for building tools that rely on dynamic hotel details like location, rating, and pricing strategies, but accessing these platforms is a lot harder with automated tools.

The main challenge is that Expedia loads its content using JavaScript, so simple scrapers can’t see the hotel listings without rendering the page. On top of that, the site often changes its layout and uses anti-bot measures like IP blocking.

How to Scrape Google Play: Step-by-Step Guide

Sat, 23 Aug 2025 00:00:00 +0000

Want to extract app names, ratings, reviews, and install counts from Google Play? Scraping is one of the fastest ways to collect valuable mobile app data from Google Play, but dynamic content and anti-bot systems make traditional scrapers unreliable

In this guide, we will teach you to scrape Google Play using Python and our beloved ScrapingBee API. Here you will find the basic necessities for your collection goals, helping you export data in clean, structured formats. Let’s make scraping simple and scalable!

How to Scrape Google Scholar with Python: A ScrapingBee Guide

Sat, 23 Aug 2025 00:00:00 +0000

Did you know that learning how to scrape Google Scholar can supercharge your research papers? This search engine is a gold mine of citations and scholarly articles that you could be analyzing at scale with a web scraper. With a reliable scraping service like ScrapingBee and some basic Python, you can automate repetitive research tasks more efficiently.

Why ScrapingBee, you may ask? Well, let’s get one thing straight – Google Scholar has tight anti-scraping measures. It means that you need a reliable Google Scholar scraper that can handle IP bans, annoying CAPTCHAs, and JavaScript rendering. Our web scraper is built with all these features, allowing you to scrape Google Scholar data without coding everything from scratch.

How to Scrape Home Depot: Complete Step-by-Step Guide

Sat, 23 Aug 2025 00:00:00 +0000

Scraping Home Depot’s product data requires handling JavaScript rendering and potential anti-bot measures. With ScrapingBee’s API, you can extract product information from Home Depot without managing headless browsers, proxies, or CAPTCHAs

Simply set up a request with JavaScript rendering enabled, target the correct URLs, and extract structured data using your preferred HTML parser. Our API handles all the complex parts of web scraping, letting you focus on using the data. In this guide, we will explain how you can do the same, working with Python and our prolific ScrapingBee API!

How to Scrape Amazon Reviews With Python (2025)

Fri, 22 Aug 2025 00:00:00 +0000

Amazon review scraping is a great way for other retailers to learn about customer wants and needs through one of the biggest retailers in e-commerce. However, many are discouraged from trying it due to the technical barrier of writing code.

If you want to access Amazon product reviews in a user-friendly way, there is no better combo than working with our HTML API through Python and its many additional libraries that help extract data from product pages. In this guide, we will cover the basics of targeting local Amazon reviews, so follow along and soon you'll be able to test the service, guaranteeing a reliable web scraping experience.

How to Scrape Google Hotels: Step-by-Step Guide

Fri, 22 Aug 2025 00:00:00 +0000

Learning how to scrape Google Hotels opens up opportunities to gain a competitive edge for your business. When you scrape this specialized search engine, you gain access to valuable pricing and availability data that can transform your competitive analysis. By using targeted scraping methods, you can collect all the hotel data that fuels market research, tracks pricing changes in real time, and supports strategic decisions.

However, even experienced developers struggle to scrape Google Hotels without getting blocked. IP blocks, CAPTCHAs, and JavaScript rendering issues create significant hurdles when trying to extract hotel data. But don’t worry – our powerful web scraping API helps you overcome these challenges.

How to Scrape Google Jobs: Step-by-Step Guide

Fri, 22 Aug 2025 00:00:00 +0000

If you're looking for a straightforward way to scrape Google Jobs, you're in the right place. In this guide, we'll walk through the steps to extract job listings and related data in just minutes using ScrapingBee. Our powerful web scraping API handles the toughest parts of the process for you: JavaScript rendering, proxy rotation, and CAPTCHA bypassing to provide the neccessary tools for consistent and reliable data extraction.

Quick Answer (TL;DR)

To scrape Google Jobs with our HTML API, write a Python script to send a GET request to its endpoint with your target Google search URL. Our tools allow you toenable JavaScript rendering by adding a short render_js=true line, which enables the use of a headless browser to bypass bot restriction and load Google's dynamic content. Add a BS4 parser to remove clutter and focus only on HTML elements that carry relevant job data.

How to Scrape Google Maps: A Step-by-Step Guide

Fri, 22 Aug 2025 00:00:00 +0000

Need business leads or location data from Google Maps but frustrated by constant CAPTCHAs, IP blocks, or unreliable scraping scripts? Scraping is one of the fastest ways to gather high-value information, but Google’s aggressive anti-bot measures turn large-scale data collection into a real challenge.

Access to business names, addresses, ratings, and phone numbers is too valuable to ignore, so users keep finding ways around Google’s automation blocks. But how exactly do they do it?

How to Scrape Google News: Step-by-Step Guide

Fri, 22 Aug 2025 00:00:00 +0000

In this blog post, I'll show you how to scrape google news by using Python and our Google news API, even if you're not a Python developer. You'll start with the straightforward RSS feed URL method to grab news headlines in structured XML. Then I'll show you how ScrapingBee’s web scraping API, our Google News API and IP rotation can extract public data.

By the end of this guide, you’ll have an easy access to the every news title you need without getting bogged down in complex infrastructure. Let's begin!

How to Scrape Google Shopping: A Step-by-Step Guide

Fri, 22 Aug 2025 00:00:00 +0000

In this guide we’ll dive into Google Shopping scraping techniques that actually work in 2025. If you’ve ever needed to extract product data, prices, or seller information from Google Shopping, you’re in the right place. Google Shopping scraping has become essential for businesses that need competitive pricing data. I’ve spent years refining these methods, and today I’ll show you how to use ScrapingBee to make this process straightforward and reliable.

How to Scrape Data in Go Using Colly

Thu, 21 Aug 2025 00:00:00 +0000

Go is a versatile language with packages and frameworks for doing almost everything. Today you will learn about one such framework called Colly that has greatly eased the development of web scrapers in Go.

Colly provides a convenient and powerful set of tools for extracting data from websites, automating web interactions, and building web scrapers. In this article, you will gain some practical experience with Colly and learn how to use it to scrape comments from Hacker News.

Easy web scraping with Scrapy

Wed, 20 Aug 2025 00:00:00 +0000

In the previous post about Web Scraping with Python we talked a bit about Scrapy. In this post we are going to dig a little bit deeper into it.

Scrapy is a wonderful open source Python web scraping framework. It handles the most common use cases when doing web scraping at scale:

Multithreading
Crawling (going from link to link)
Extracting the data
Validating
Saving to different format / databases
Many more

The main difference between Scrapy and other commonly used libraries, such as Requests / BeautifulSoup, is that it is opinionated, meaning it comes with a set of rules and conventions, which allow you to solve the usual web scraping problems in an elegant way.

How To Set Up a Rotating Proxy in Puppeteer

Tue, 19 Aug 2025 00:00:00 +0000

Puppeteer is a popular headless browser used with Node.js for web scraping. However, even with Puppeteer, your IP can get blocked if your script is identified as a bot. That's where the Puppeteer proxy comes in.

A proxy acts as a middleman between the client and server. When a client makes a request through a proxy, the proxy forwards it to the server. This makes detecting and blocking your IP harder for the target site.

What to Do If Your IP Gets Banned While You're Scraping

Mon, 18 Aug 2025 00:00:00 +0000

Web scraping is valuable for gathering information, studying markets, and understanding competition. But web scrapers often run into a problem: getting banned from websites.

In most cases, it happens because the scrapers violated the website's terms of service (ToS) or generate so much traffic that they abuse the website's resources and prevent normal functioning. To protect itself, the website bans your IP from accessing its resources either temporarily or permanently.

How to scrape channel data from YouTube

Sun, 17 Aug 2025 00:00:00 +0000

If you are an internet user, it is safe to assume that you are no stranger to YouTube. It is the hub for videos on internet and even back in 2020, 500 hours of videos were being uploaded to YouTube every minute! This has led to the accumulation of a ton of useful data on the platform. You can extract and make use of some of this data via the official YouTube API but it is rate limited and doesn't contain all the data viewable on the website. In this tutorial, you will learn how you can scrape YouTube data using Selenium. This tutorial will specifically focus on extracting information about videos uploaded by a channel but the techniques are easily transferrable to extracting search results and individual video data.

How to submit a form with Puppeteer?

Sat, 16 Aug 2025 00:00:00 +0000

In this article, we will take a look at how to automate form submission using Puppeteer. Puppeteer is an open source Node library that provides a high-level API to control Chrome or Chromium based browsers over the DevTools Protocol. Every tasks that you can perform with a Chrome browser can be automated with Puppeteer. This makes Puppeteer an ideal tool for web scraping and test automation. In this article, we will go over everything you need to know about automating form submission with Puppeteer. We will discuss

How To Set Up A Rotating Proxy in Selenium with Python

Fri, 15 Aug 2025 00:00:00 +0000

Selenium is a popular browser automation library that allows you to control headless browsers programmatically. However, even with Selenium, your script can still be identified as a bot and your IP address can be blocked. This is where Selenium proxies come in.

Web Scraping with Elixir

Thu, 14 Aug 2025 00:00:00 +0000

Web scraping is the process of extracting data from a website. Scraping can be a powerful tool in a developer's arsenal when they're looking at problems like automation or investigation, or when they need to collect data from public websites that lack an API or provide limited access to the data.

People and businesses from a myriad of different backgrounds use web scraping, and it's more common than people realize. In fact, if you've ever copy-pasted code from a website, you've performed the same function as a web scraper—albeit in a more limited fashion.

How to bypass cloudflare antibot protection at scale in 2025

Wed, 13 Aug 2025 00:00:00 +0000

Over 7.59 million active websites use Cloudflare. The website you intend to scrape might be protected by it. Websites protected by services like Cloudflare can be challenging to scrape due to the various anti-bot measures they implement. If you've tried scraping such websites, you're likely already aware of the difficulty of bypassing Cloudflare's bot detection system.

Bypassing Cloudflare becomes a near-necessity for large-scale projects or scraping popular websites. There are various methods to bypass Cloudflare, each with its pros and cons. In this guide, we'll explore each method in detail, allowing you to choose the one that best suits your needs.

How to use AI for automated price scraping?

Tue, 12 Aug 2025 00:00:00 +0000

In order to perform price scraping, you need to know the CSS selector or the xPath for the target element. Therefore, if you are scraping thousands of websites, you need to manually figure out the selector for each of them. And if the page changes, you need to change that as well.

Well, not anymore.

Today, you are going to learn how to perform automated price scraping with AI. You are going to use the power of AI to automatically get the CSS selector of the elements you want to scrape, so that you can do it at scale.

Infinite Scroll with Puppeteer

Mon, 11 Aug 2025 00:00:00 +0000

Web scraping is automating the process of data collection from the web. This usually means deploying a “crawler” that automatically searches the web and scrapes data from selected pages. Data collection through scraping can be much faster, eliminating the need for manual data-gathering, and maybe mandatory if the website has no provided API. Scraping methods change based on the website's data display mechanisms.

One way to display content is through a one-page website, also known as a single-page application. Single-page applications (SPA) have become a trend, and with the implementation of infinite scrolling techniques, programmers can develop SPA that allows users to scroll forever. If you are an avid social media user, you have most likely experienced this feature before on platforms like Instagram, Twitter, Facebook, Pinterest, etc.

How to execute JavaScript with Scrapy?

Sun, 10 Aug 2025 00:00:00 +0000

Most modern websites use a client-side JavaScript framework such as React, Vue or Angular. Scraping data from a dynamic website without server-side rendering often requires executing JavaScript code.

I’ve scraped hundreds of sites, and I always use Scrapy. Scrapy is a popular Python web scraping framework. Compared to other Python scraping libraries, such as Beautiful Soup, Scrapy forces you to structure your code based on some best practices. In exchange, Scrapy takes care of concurrency, collecting stats, caching, handling retrial logic and many others.

The Best Ruby HTTP clients

Sat, 09 Aug 2025 00:00:00 +0000

How does one choose the perfect HTTP Client? The Ruby ecosystem offers a wealth of gems to make an HTTP request. Some are pure Ruby, some are based on Ruby's native Net::HTTP, and some are wrappers for existing libraries or Ruby bindings for libcurl. In this article, I will present the most popular gems by providing a short description and code snippets of making a request to the Dad Jokes API. The gems will be provided in the order from the most-downloaded one to the least. To conclude I will compare them all in a table format and provide a quick summary, as well as guidance on which gem to choose.

Getting Started with MechanicalSoup

Fri, 08 Aug 2025 00:00:00 +0000

Python is a popular choice for web-scraping projects, owing to how easy the language makes scripting and its wide range of scraping libraries and frameworks. MechanicalSoup is one such library that can help you set up web scraping in Python quite easily.

This Python browser automation library allows you to simulate user actions on a browser like the following:

Filling out forms
Submitting data
Clicking buttons
Navigating through pages

One of the key features of MechanicalSoup is that its stateful browser can retain state and track state changes between requests. This helps simplify browser automation scripts in complex use cases, such as handling forms and dynamic content. MechanicalSoup also comes prebundled with Beautiful Soup, a popular Python library for parsing and manipulating web page content. Using MechanicalSoup and Beautiful Soup, you can write complex scraping scripts easily.

How to Easily Scrape Shopify Stores With AI

Thu, 07 Aug 2025 00:00:00 +0000

Scraping Shopify stores can be a challenging task because each store uses a unique theme and layout, making traditional scrapers with rigid selectors unreliable. That’s why we'll be showing you how to leverage an AI-powered web scraper that easily adapts to any page structure, effortlessly extracting Shopify e-commerce data no matter how the store is designed.

In this tutorial, we’ll be using our Python Scrapingbee client to scrape one of the most successful Shopify stores on the planet; gymshark.com, to obtain all the product page URLs and the corresponding product details from each product page. We’ve previously written blogs about scraping product listing pages using Scrapy or using schema.org metadata. We’ll also be using our AI query feature to extract structured data from each product page without parsing any HTML. Please note that we’re using Python only for demonstration and this technique and our API will work with any programming language.

The 5 Best Free Proxy Lists for Web Scraping

Wed, 06 Aug 2025 00:00:00 +0000

Introduction

In this article, we will look at the top five proxy list websites and perform a benchmark.

If you are in a hurry and wish to go straight to the results, click here.

The idea is not only to talk about the different features they offer, but also to test the reliability with a real-world test. We will look at and compare the response times, errors, and success rates on popular websites like Google and Amazon.

How to scrape data from idealista

Tue, 05 Aug 2025 00:00:00 +0000

Idealista is a very famous listing website that lists millions of properties for sale and/or rent. It is available in Spain, Portugal, and Italy. Such property listing websites are among the best ways to do market research, analyze market trends, and find a suitable place to buy. In this article, you will learn how to scrape data from idealista. The website uses anti-web scraping techniques and you will learn how to circumvent them as well.

How to use a Proxy with Ruby and Faraday

Mon, 04 Aug 2025 00:00:00 +0000

Why use Faraday?

Faraday is a very famous and mature HTTP client library for Ruby. It uses an adapter-based approach which means you can swap out the underlying HTTP requests library without modifying the overarching Faraday code. By default, Faraday uses the Net::HTTP adapter but you can switch it out with Excon, Typhoeus, Patron or EventMachine without modifying more than a line or two of configuration code. This makes Faraday extremely flexible and relatively future-proof.

How to Set Up a Proxy Server with Apache

Sun, 03 Aug 2025 00:00:00 +0000

A proxy server is an intermediate server between a client and another server. The client sends the requests to the proxy server, which then passes them to the destination server. The destination server sends the response to the proxy server, and it forwards this to the client.

In the world of web scraping, using a proxy server is common for the following reasons:

Privacy: A proxy server hides the IP address of the scraper, providing a layer of privacy.
Avoiding IP bans: A proxy server can be used to circumvent IP bans. If the target website blocks the IP address of the proxy server, you can simply use a different proxy server.
Circumventing geoblocking: By connecting to a proxy server situated in a certain region, you can circumvent geoblocking. For instance, if your content is available only in the US, you can connect to a proxy server in the US and scrape as much as you want to.

In this article, you'll learn how to set up your own proxy server and use it to scrape websites. There are many ways to create a DIY proxy server, such as using Apache or Nginx as proxy servers or using dedicated proxy tools like Squid. In this article, you'll use Apache.

What is a Headless Browser: Top 8 Options for 2025 [Pros vs. Cons]

Sat, 02 Aug 2025 00:00:00 +0000

Imagine a world where web browsers work tirelessly behind the scenes, navigating websites, filling forms, and capturing data without ever showing a single pixel on a screen. I welcome you to the realm of headless browsers - the unsung heroes of web automation and testing!

In today's digital landscape, where web applications grow increasingly complex and data-driven decision-making reigns supreme, headless browsers have emerged as indispensable tools for developers, quality assurance (QA) engineers, and data enthusiasts alike. They're the Swiss Army knives of the web, capable of slicing through mundane tasks, carving out efficiencies, and sculpting robust testing environments.

Puppeteer Stealth Tutorial; How to Set Up & Use (+ Working Alternatives)

Fri, 01 Aug 2025 00:00:00 +0000

Puppeteer is a robust headless browser library created mainly to automate user interactions. However, it can be easily detected and blocked by anti-scraping measures due to its lack of built-in stealth capabilities. This is where Puppeteer Extra comes in, offering plugins like Stealth to address this limitation.

This tutorial will explore how to utilize Puppeteer Stealth to attempt to evade detection while scraping websites effectively. We also cover solutions and alternatives for by-passing the latest cutting edge anti-bot tech which Puppeteer Stealth sometimes struggles to evade.

Getting Started with Goutte

Thu, 31 Jul 2025 00:00:00 +0000

While Node.js and Python dominate the web scraping landscape, Goutte is the go-to choice for PHP developers. It's a powerful library that provides a simple yet efficient solution to automatically extract data from websites.

Whether you're a beginner or an experienced developer, Goutte allows you to effortlessly scrape data from websites and seamlessly display it on the frontend directly from your PHP scripts. Goutte also ensures that the scraping process doesn't compromise loading time or consume excessive backend resources such as RAM, making it an optimal choice for PHP-based scraping tasks.

Crawl4AI - a hands-on guide to AI-friendly web crawling

Wed, 30 Jul 2025 00:00:00 +0000

If you're building stuff with large language models or AI agents, chances are you'll need web data. And that means writing a crawler, ideally something fast, flexible, and not a total pain to set up. Like, we probably don't want to spend countless hours trying to run a simple "hello world" app. That's where Crawl4AI comes in.

Crawl4AI is an open-source crawler made by devs, for devs. It gives you control, speed, structured output, and enough room to do serious things without getting buried in boilerplate.

How to use asyncio to scrape websites with Python

Wed, 30 Jul 2025 00:00:00 +0000

In this article, we'll take a look at how you can use Python and its coroutines, with their async/await syntax, to efficiently scrape websites, without having to go all-in on threads 🧵 and semaphores 🚦. For this purpose, we'll check out asyncio, along with the asynchronous HTTP library aiohttp.

What is asyncio?

asyncio is part of Python's standard library (yay, no additional dependency to manage 🥳) which enables the implementation of concurrency using the same asynchronous patterns you may already know from JavaScript and other languages: async and await

How to scrape data from Twitter.com

Tue, 29 Jul 2025 00:00:00 +0000

Twitter is a gold mine for data. It started as a micro-blogging website and has quickly grown to become the favorite hangout spot for millions of people. Twitter provides access to most of its data via its official API but sometimes that is not enough.

Web scraping provides some advantages over using the official API. For example, Twitter's API is rate-limited and you need to wait for a while before Twitter approves your application request and lets you access its data but this is not the case with web scraping.

10 Tips on How to make Python's Beautiful Soup faster when scraping

Mon, 28 Jul 2025 00:00:00 +0000

Beautiful Soup is super easy to use for parsing HTML and is hugely popular. However, if you're extracting a gigantic amount of data from tons of scraped pages it can slow to a crawl if not properly optimized.

In this tutorial, I'll show you 10 expert-level tips and tricks for transforming Beautiful Soup into a blazing-fast data-extracting beast and how to optimize your scraping process to be as fast as lightning.

How to Parse HTML in Ruby with Nokogiri?

Sun, 27 Jul 2025 00:00:00 +0000

APIs are the cornerstone of the modern internet as they enable different services to communicate with each other. With APIs, you can gather information from different sources and use different services. However, not all services provide an API for you to consume. Even if an API is offered, it might be limited in comparison to a service’s web application(s). Thankfully, you can use web scraping to overcome these limitations. Web scraping refers to the practice of extracting data from the HTML source of the web page. That is, instead of communicating with a server through APIs, web scraping lets you extract information directly from the web page itself.

How to Parse HTML with Regex

Sat, 26 Jul 2025 00:00:00 +0000

The amount of information available on the internet for human consumption is astounding. However, if this data doesn't come in the form of a specialized REST API, it can be challenging to access programmatically. The technique of gathering and processing raw data from the internet is known as web scraping. There are several uses for web scraping in software development. Data collected through web scraping can be applied in market research, lead generation‍, competitive intelligence, product pricing comparison, monitoring consumer sentiment, brand audits, AI and machine learning, creating a job board, and more.

Getting Started with HtmlUnit

Fri, 25 Jul 2025 00:00:00 +0000

HtmlUnit is a GUI-less browser for Java that can execute JavaScript and perform AJAX calls.

Although primarily used to automate testing, HtmlUnit is a great choice for scraping static and dynamic pages alike because of its ability to manipulate web pages on a high level, such as clicking on buttons, submitting forms, providing input, and so forth. HtmlUnit supports the W3C DOM standard, CSS selectors, and XPath selectors, and it can simulate the Firefox, Chrome, and Internet Explorer browsers, which makes web scraping easier.

How to Build a News Crawler with the ScrapingBee API

Thu, 24 Jul 2025 00:00:00 +0000

Imagine you're a developer who needs to keep track of the latest news from multiple sources for a project you're working on. Instead of manually visiting each news website and checking for updates, you want to automate this process to save time and effort. You need a news crawler.

In this article, you'll see how easy it can be to build a news crawler using Python Flask and the ScrapingBee API. You'll learn how to set up ScrapingBee, implement crawling logic, and display the extracted news on a web page.

How to scrape Google search results data in Python easily

Wed, 23 Jul 2025 00:00:00 +0000

Google search engine results pages (SERPs) can provide alot of important data for you and your business but you most likely wouldn't want to scrape it manually. After all, there might be multiple queries you're interested in, and the corresponding results should be monitored on a regular basis. This is where automated scraping comes into play: you write a script that processes the results for you or use a dedicated tool to do all the heavy lifting.

How to Web Scrape Yelp.com

Tue, 22 Jul 2025 00:00:00 +0000

With more than 199 million reviews of businesses worldwide, Yelp is one of the biggest websites for crowd-sourced reviews. In this article, you will learn how to scrape data from Yelp's search results and individual restaurant pages. You will be learning about the different Python libraries that can be used for web scraping and the techniques to use them effectively.

If you have never heard about Yelp before, it is an American company that crowd-sources reviews for local businesses. They started as a reviews company for restaurants and food businesses but have lately been branching out to cover additional industries as well. Yelp reviews are very important for food businesses as they directly affect their revenues. A restaurant owner told Harvard Business Review:

Python Web Scraping: Full Tutorial With Examples (2025)

Tue, 22 Jul 2025 00:00:00 +0000

Have you ever wondered how to scrape data from any website automatically? Or how some websites and web applications can extract and display data so seamlessly from other sites in real-time? Whether you want to collect and track prices from e-commerce sites, gather news articles and research data, or monitor social media trends, web scraping is the tool you need.

In this tutorial, we'll explore the world of web scraping with Python, guiding you from the basics for beginners to advanced techniques for web scraping experts. In my experience, Python is a powerful tool for automating data extraction from websites and one of the most powerful and versatile languages for web scraping, thanks to its vast array of libraries and frameworks.

Scraping single page applications with Python.

Mon, 21 Jul 2025 00:00:00 +0000

Dealing with a website that uses lots of Javascript to render their content can be tricky. These days, more and more sites are using frameworks like Angular, React, Vue.js for their frontend.

These frontend frameworks are complicated to deal with because there are often using the newest features of the HTML5 API.

So basically the problem that you will encounter is that your headless browser will download the HTML code, and the Javascript code, but will not be able to execute the full Javascript code, and the webpage will not be totally rendered.

How to web scrape Zillow’s real estate data at scale

Sun, 20 Jul 2025 00:00:00 +0000

If you're looking to buy or sell a house or other real estate property, Zillow is an excellent resource with millions of property listings and detailed market data.

In addition to traditional real estate purposes, the data available on Zillow comes in handy for market analysis, tracking housing trends, or building a real estate application.

This tutorial will guide you to effectively scrape Zillow's real estate data at scale using Python, BeautifulSoup, and the ScrapingBee API.

Block ressources with Puppeteer

Sat, 19 Jul 2025 00:00:00 +0000

In this article, we will take a look at how to block specific resources (HTTP requests, CSS, video, images) from loading in Puppeteer. Puppeteer is one of the most widely used tools for web scraping and automation. There are a couple of ways to block resources in Puppeteer. In this article, we will go over all the various methods we can use to block/intercept specific network requests in our automation scripts.

Web Scraping Booking.com

Fri, 18 Jul 2025 00:00:00 +0000

With more than 28 million listings, Booking.com is one of the biggest websites to look for a place to stay during your trip. If you are opening up a new hotel in an area, you might want to keep tabs on your competition and get notified when new properties open up. This can all be automated with the power of web scraping! In this article, you will learn how to scrape data from the search results page of Booking.com using Python and Selenium and also handle pagination along the way.

How to read and parse JSON data with Python

Thu, 17 Jul 2025 00:00:00 +0000

JSON, or JavaScript Object Notation, is a popular data interchange format that has become a staple in modern web development. If you're a programmer, chances are you've come across JSON in one form or another. It's widely used in REST APIs, single-page applications, and other modern web technologies to transmit data between a server and a client, or between different parts of a client-side application. JSON is lightweight, easy to read, and simple to use, making it an ideal choice for developers looking to transmit data quickly and efficiently.

Web Scraping with Groovy

Wed, 16 Jul 2025 00:00:00 +0000

Groovy has been around for quite a while and has established itself as reliable scripting language for tasks where you'd like to use the full power of Java and the JVM, but without all its verbosity.

While typical use-cases often are build pipelines or automated testing, it works equally well for anything related to data extraction and web scraping. And that's precisely, what we are going to check out in this article. Let's fasten our seatbelts and dive right into web scraping and handling HTTP requests with Groovy.

Ruby HTML and XML Parsers

Tue, 15 Jul 2025 00:00:00 +0000

Ruby HTML and XML Parsers

Extracting data from the web—that is, web scraping—typically requires reading and processing content from HTML and XML documents. Parsers are software tools that facilitate this scraping of web pages.

The Ruby developer community offers some fantastic HTML and XML parsers that can serve all your web scraping needs—there are a lot of options out there. In choosing which to go with, you might consider the following criteria:

Crawlee for Python Tutorial with Examples

Mon, 14 Jul 2025 00:00:00 +0000

Crawlee is a brand new, free & open-source (FOSS) web scraping library built by the folks at APIFY. While it is available for both Node.js and Python, we'll be looking at the Python library in this brief guide. It's barely been a few weeks since its release and the library has already amassed about 2800 stars on GitHub! Let's see what it's all about and why it got all those stars.

N8N No-Code Web Scraping Made Simple with AI-Powered Data Extraction

Mon, 14 Jul 2025 00:00:00 +0000

Are you a marketer tracking competitor prices? A content creator monitoring trending topics? Maybe you're a small business owner researching leads or a data analyst gathering insights from websites? If any of these describe you, you're in the right place!

No-code platforms like n8n are changing how we handle repetitive data collection tasks. What used to require hiring developers or spending hours on manual copying can now be automated with visual workflows in minutes.

Node-unblocker for Web Scraping

Sun, 13 Jul 2025 00:00:00 +0000

Web proxies help you keep your privacy and get around various restrictions while browsing the web. They hide your details, such as the request origin or IP address, and with additional software can even bypass things like rate limits.

node-unblocker is one such web proxy that includes a form of Node.js library. You can use it for web scraping and accessing geo-restricted content, as well as other functions.

In this article, you’ll learn how to implement and use node-unblocker. You’ll also see its pros, cons, and limitations as compared to a managed service like ScrapingBee.

How to Log in to Almost Any Websites

Sat, 12 Jul 2025 00:00:00 +0000

In the first article about java web scraping I showed how to extract data from CraigList website. But what about the data you want or if the action you want to carry out on a website requires authentication ?

In this short tutorial I will show you how to make a generic method that can handle most authentication forms.

Authentication mechanism

There are many different authentication mechanisms, the most frequent being a login form , sometimes with a CSRF token as a hidden input.

7 Best Python Web Scraping Libraries for 2025

Fri, 11 Jul 2025 00:00:00 +0000

In this tutorial, I will show you some of the best and Python web scraping libraries. Web scraping is often way more challenging than it initally seems due to various challenges like session handling, cookies, dynamically loaded content, JavaScript execution, and even anti-scraping measures (for example, CAPTCHA, IP blocking, and rate limiting).

This is where advanced web scraping libraries come in handy. They abstract away the complexity of web scraping, allowing you to focus on data extraction. Picking the right one can set you up for success.

XPath/CSS Cheat Sheet

Thu, 10 Jul 2025 00:00:00 +0000

This cheat sheet provides a comprehensive overview of XPath and CSS selectors. It includes the most commonly used selectors and functions, along with examples to help you understand how they work.

This cheat sheet is available to download as a PDF file.

How to copy an XPath selector from Chrome Dev Tools

Open Chrome Dev Tools (press F12 key or right-click on the webpage and select "Inspect")
Use the element selector tool to highlight the element you want to scrape
Right-click the highlighted element in the Dev Tools panel
Select "Copy" and then "Copy XPath"
Paste the XPath expression into the code

Using Watir to automate web browsers with Ruby

Wed, 09 Jul 2025 00:00:00 +0000

For years, it’s been possible to automate simple tasks on a computer when those tasks have been executed using the command line. This is known as scripting. A bigger challenge, however, is to control the browser since a GUI introduces a lot more variability in how elements act.

Browser automation describes the process of programmatically performing certain actions in the browser (or handing these actions over to robots) that might otherwise be quite tedious or repetitive to be performed manually by a human.

Charles proxy for web scraping

Tue, 08 Jul 2025 00:00:00 +0000

Charles proxy is an HTTP debugging proxy that can inspect network calls and debug SSL traffic. With Charles, you are able to inspect requests/responses, headers and cookies. Today we will see how to set up Charles, and how we can use Charles proxy for web scraping. We will focus on extracting data from Javascript-heavy web pages and mobile applications. Charles sits between your applications and the internet:

Charles is like the Chrome dev tools on steroids. It has many incredible features:

How to Download Files via cURL With Battle Ready Examples

Mon, 07 Jul 2025 00:00:00 +0000

Picture this: It's 3 AM, and you're staring at your terminal, trying to download hundreds of data files for tomorrow's analysis. Your mouse hand is cramping from all that right-click, "Save As" action, and you're thinking there has to be a better way. (Spoiler alert: there is, and you've just found it!)

Welcome to the world of file downloads with cURL, where what seems like command-line sorcery to many is about to become your new superpower. As an automation specialist who's orchestrated thousands of automated downloads, I've seen firsthand how cURL knowledge can transform tedious download tasks into elegant, automated solutions — from simple file transfers to complex authenticated downloads that would make even seasoned developers scratch their heads.

Web Scraping with JavaScript and Node.js

Mon, 07 Jul 2025 00:00:00 +0000

JavaScript has become one of the most popular and widely used languages due to the massive improvements it has seen and the introduction of the runtime known as Node.js. Whether it's a web or mobile application, JavaScript now has the right tools. This article will explain how the vibrant ecosystem of Node.js allows you to efficiently scrape the web to meet most of your requirements.

Prerequisites

This post is primarily aimed at developers who have some level of experience with JavaScript. However, if you have a firm understanding of web scraping but have no experience with JavaScript, it may still serve as light introduction to JavaScript. Still, having experience in the following fields will certainly help:

Using the Cheerio NPM Package for Web Scraping

Sun, 06 Jul 2025 00:00:00 +0000

Have you ever manually copied data from a table on a website into an excel spreadsheet so you could analyze it? If you have, then you know how tedious of a process it can be. Fortunately, there's a tool that allows you to easily scrape data from web pages using Node.js. You can use Cheerio to collect data from just about any HTML. You can pull data out of HTML strings or crawl a website to collect product data.

A Guide To Web Scraping For Data Journalism

Sat, 05 Jul 2025 00:00:00 +0000

Web scraping may not sound much like a traditional journalistic practice but, in fact, it is a valuable tool that can allow journalists to turn almost any website into a powerful source of data from which they can build and illustrate their stories. Demand for these kinds of skills is on the increase, and this guide will explain some of the different techniques that can be used to gather data through web scraping and how it can be used to fuel incisive data journalism.

How to scrape data from realtor.com

Fri, 04 Jul 2025 00:00:00 +0000

Realtor is the second biggest real estate listing website in the US and contains millions of properties. You will be missing out on saving money if you don't do market research on realtor before doing your next property purchase. To make use of the treasure trove of data available on realtor, it is necessary to scrape it. This tutorial will show you exactly how you can do that while bypassing the bot detection used by realtor.com.

Haskell Web Scraping

Thu, 03 Jul 2025 00:00:00 +0000

Even though web scraping is commonly done with languages like Python and JavaScript, a statically typed functional programming language like Haskell can provide extra benefits. Types make sure that your scripts do what you want them to do and that the data scraped conforms to your requirements.

In this article, you'll learn how to do web scraping in Haskell with libraries such as Scalpel and webdriver.

Basic Scraping

Scraping a static website can be done with any language that has libraries for an HTTP client and HTML parsing. Haskell is no different. It even has a dedicated high-level scraping library called Scalpel, which puts it above similar languages like Rust.

The 6 Best mobile and 4G proxy providers for web scraping

Wed, 02 Jul 2025 00:00:00 +0000

In this article, we will look at the six best mobile and 4G proxy providers for web scraping. We will not only look at the different features they offer but also perform a real-world test that includes the performance, speed, and success and error rate on some of the most popular websites: Instagram, Google, Amazon and the top 1,000 Alexa rank (the list of the most visited domains in the world).

How to Web Scrape Walmart.com

Tue, 01 Jul 2025 00:00:00 +0000

Introduction

In this article, you will learn how to scrape product information from Walmart, the world's largest company by revenue (US $570 billion), and the world's largest private employer with 2.2 million employees.

You might want to scrape the product pages on Walmart for monitoring stock levels for a particular item or for monitoring product prices. This can be useful when a product is sold out on the website and you want to make sure you are notified as soon as the stock is replenished.

In this article you will learn:

How to use a proxy with node-fetch?

Mon, 30 Jun 2025 00:00:00 +0000

Why node-fetch?

Node-fetch is a popular HTTP client library, with around twenty million downloads per week; according to NPM, it is also one of the most downloaded NPM packages of all-time.

Node-fetch's primary motivation was to implement a server-side API similar to window.fetch, a client-side one; since it is implemented in the browser.

This API is primarily used to make asynchronous requests to load content on the browser side. However, on the server-side of things, there are many more use-cases.

What Is a Transparent Proxy?

Mon, 30 Jun 2025 00:00:00 +0000

Whether you're an individual user seeking improved online privacy or a network administrator striving to optimize network performance and security for your organization, understanding the nuances of web proxies is crucial. Web proxies are web servers that act as a gateway between a client application and the server it needs to communicate with.

One such proxy that plays a vital role in network management and cybersecurity is a transparent proxy. Transparent proxies are used to set up content filtering and caching, protect from common cybersecurity attacks such as DDoS, and facilitate network traffic management.

No-code web scraping

Sun, 29 Jun 2025 00:00:00 +0000

You can create software without code.

It’s crazy, right?

There are many tools that you can use to build fully functional software. They can do anything you want. Without code.

You might be thinking to yourself, what if I need something complex, like a web scraper? That’s too much, right?

To create a web scraper, you need to create a code block to load the page. Then, you need another module to parse it. Next, you build another block to deal with this information and run actions. Also, you have to find ways to deal with IP blocks. To make matters worse, you might need to interact with the target page. Clicking buttons, waiting for elements, taking screenshots.

Study of Amazon’s Best Selling & Most Read Book Charts Since 2017

Sat, 28 Jun 2025 00:00:00 +0000

Amazon is most well known as an online shopping website, and among the tech folks for Amazon Web Services. However, it was initially started as an online bookstore. They are also well known for the Kindle eBook and the Audiobook experiences they offer.

The extensive offerings in the literature space have given Amazon so much data about reading patterns on a global scale. They present this data by publishing 4 charts every week. These 4 charts are the most read and the most sold books in fiction and non-fiction categories in the USA.

OCaml Web Scraping

Fri, 27 Jun 2025 00:00:00 +0000

OCaml is a modern, type-safe, and expressive functional programming language. Even though it's less commonly used than popular languages like Python or Java, you can create powerful applications like web scrapers with it.

In this article, you'll learn how to scrape static and dynamic websites with OCaml.

To follow along, you'll need to have OCaml installed on your computer, OPAM initialized, and Dune installed. All of these steps are explained in the official installation instructions, so go ahead and set up the development environment before you continue.

Extract Job Listings, Details and Salaries from Indeed with ScrapingBee and Make.com

Thu, 26 Jun 2025 00:00:00 +0000

Taking the time to read through target pages is usually not the best idea. It's too time-consuming and it's easy to miss important changes when you're scrolling through hundreds of pages. Therefore, learning how to perform updates automatically without the need for coding skills is crucial.

In this tutorial, we will scrape jobs from indeed.com, one of the most popular job aggregator websites. Web scraping is an excellent tool for finding valuable information from a job listing database.

Guide to Choosing a Proxy API for Scraping

Wed, 25 Jun 2025 00:00:00 +0000

You're in the thick of it, scraping the web to extract data pivotal to your core product. During this process, you quickly realize that websites deploy defense mechanisms against potential scrapers. For instance, if your server IP address keeps hitting a site for data, it might get flagged and subsequently banned.

This is where a proxy API can help. A proxy API is like your Swiss Army knife for web scraping. It's designed to provide you web scraping operations that are seamless, efficient, and most importantly, undetected.

Web Scraping with Objective C

Tue, 24 Jun 2025 00:00:00 +0000

In this article, you’ll learn about the main tools and techniques for web scraping using Objective C for both static and dynamic web pages.

This article assumes that you’re already familiar with Objective C and XCode, which will be used to create, compile, and run the projects on a macOS—though you can easily change things to run on iOS if preferred.

Basic Scraping

First, let’s take a look at using Objective C to scrape a static web page from Wikipedia:

Comparing Forward Proxies and Reverse Proxies

Mon, 23 Jun 2025 00:00:00 +0000

In an age dominated by the internet, where data flows ceaselessly between devices and servers, proxies have grown to become an integral part of networks. Proxies play a vital role in the seamless exchange of information on the web.

Proxies act as digital intermediaries, facilitating secure and efficient communication between your device and the destination server. There are two types, forward proxies and reverse proxies, each serving a distinct function.

Puppeteer Web Scraping Tutorial in Nodejs

Mon, 23 Jun 2025 00:00:00 +0000

In this tutorial, we are going to take a look at Puppeteer, a JavaScript library developed by Google. Puppeteer provides a native automation interface for Chrome and Firefox, allowing you to launch a headless browser instance and take full control of websites, including taking screenshots, submitting forms, extracting data, and more. Let's dive right in with a real-world example. 🤿

Top 5 Rotating and Residential Proxies for Web Scraping

Sun, 22 Jun 2025 00:00:00 +0000

There are a lot of different proxy types you can use for web scraping, like residential proxies or data center proxies. A residential IP is an IP address that belongs to a real user, with a real Internet Service Provider (ISP). These IPs enable web requests to be seen as a real users, and are much less likely to be blocked by third party websites.

A data center proxy is similar to a residential proxy except residential proxies are more trusted by websites. Data center proxies are the most common proxies available and it leads to one of the drawbacks of data center proxies: some websites can detect when you’re using a data center proxy and are more likely to block those IP addresses because many of them are used by bots.

Scrape Amazon products' price with no code

Sat, 21 Jun 2025 00:00:00 +0000

It's safe to assume that many of us had bookmarked Amazon product pages from several retailers for a similar product to easily compare pricing.

This article will guide you through scraping product information from Amazon.com so you never miss a great deal on a product. You will monitor similar ******product pages and compare the prices.

This tutorial is designed so that you can follow along smoothly if you already know the basic concepts. Here's what we'll do:

Getting Started with Jaunt Java

Fri, 20 Jun 2025 00:00:00 +0000

While Python and Node.js are popular platforms for writing scraping scripts, Jaunt provides similar capabilities for Java.

Jaunt is a Java library that provides web scraping, web automation, and JSON querying abilities. It relies on a light, headless browser to load websites and query their DOM. The only downside is that it doesn't support JavaScript—but for that, you can use Jauntium, a Java browser automation framework developed and maintained by the same person behind Jaunt, Tom Cervenka.

Playwright vs Selenium: Which is the best Headless Browser

Thu, 19 Jun 2025 00:00:00 +0000

For years Selenium has reigned as the undisputed champion of web automation, dominating the ring with its vast capabilities and developer loyalty. But now a formidable rival has risen, Playwright. This battle of the titans is set to determine which tool truly deserves the crown of web automation champion. Each contender brings its own unique strengths and strategies to the arena, but which will emerge victorious in the fight for web automation supremacy?

ScrapingBee is joining Oxylabs’ group

Thu, 19 Jun 2025 00:00:00 +0000

Today, we’re incredibly proud and excited to announce that ScrapingBee has officially become part of Oxylabs’ group.

Oxylabs’ company group already offers a variety of industry-leading proxy and data gathering solutions. Through this acquisition, they aim to strengthen their position as a market leader while helping elevate the web scraping industry as a whole.

At ScrapingBee, our mission has always been to offer a transparent, easy-to-use, and high-performance web scraping solution.

Web Scraping Handling Ajax Website

Wed, 18 Jun 2025 00:00:00 +0000

Today more and more websites are using Ajax for fancy user experiences, dynamic web pages, and many more good reasons. Crawling Ajax heavy website can be tricky and painful, we are going to see some tricks to make it easier.

Prerequisite

Before starting, please read the previous articles I wrote to understand how to set up your Java environment, and have a basic understanding of HtmlUnit Introduction to Web Scraping With Java and Handling Authentication. After reading this you should be a little bit more familiar with web scraping.

XPath vs CSS selectors

Tue, 17 Jun 2025 00:00:00 +0000

Introduction

If you have already browsed our web scraping blog a bit, you will probably have already come across our introduction to XPath expressions, as well as our article on using CSS selectors for web scraping - if you haven't yet, highly recommended 👍. Quite a few good reads.

So you may already have a good idea of what they do and how they are used, but what might be missing - to complete the picture - is how they compare to each other. That's exactly what we are going to do in today's article.

What is data parsing?

Mon, 16 Jun 2025 00:00:00 +0000

Data parsing is the process of taking data in one format and transforming it to another format. You'll find parsers used everywhere. They are commonly used in compilers when we need to parse computer code and generate machine code.

This happens all the time when developers write code that gets run on hardware. Parsers are also present in SQL engines. SQL engines parse a SQL query, execute it, and return the results.

Free AI Powered Proxy Scraper for Getting Fresh Public Proxies

Sun, 15 Jun 2025 00:00:00 +0000

Proxies are your ultimate cheat code, helping you bypass the anti-scraping bosses guarding valuable data behind firewalls and restrictions. This guide shows you how to obtain free proxies with an AI-powered scraper API, saving you time and money while leveling up your scraping game like a pro.

Free proxies are listed by several sources on the internet, and they usually allow us to filter by protocol type, country, and other parameters. In a previous blog post, we looked at some of these sources and tested them for various quality parameters. (In the context of proxies, quality would refer to whether the proxy actually works or not, and also the time it takes to complete a request.) In this tutorial we'll show you how to scrape fresh public proxies from any source and evaluate them to figure out which ones are working.

Google Ads Competitor Analysis: 4 Battle-Tested Methods

Sat, 14 Jun 2025 00:00:00 +0000

You're reviewing your Google Ads dashboard on a Monday morning, coffee in hand, when you notice your cost-per-click has mysteriously skyrocketed over the weekend. Your best-performing keywords are suddenly bleeding money, and your once-reliable ad positions are slipping. Sound familiar?

In my years of experience with PPC campaigns and developing web scraping solutions, I've learned that in the high-stakes world of Google Ads, flying blind to your competitors' moves isn't just risky – it's expensive.

AI and the Art of Reddit Humor: Mapping Which Countries Joke the Most

Fri, 13 Jun 2025 00:00:00 +0000

Making jokes on the internet is a fine art and Reddit users globally are working diligently to keep the dad jokes coming, because the only thing better than winning an internet argument is winning an internet upvote contest with a punchline your dad would be proud of.

In fact, Reddit's vast reservoir of dad jokes may just be the secret ingredient that helped it reach a staggering $6.4 billion valuation at its recent IPO. Who knew that jokes your dad repeats at every family gathering could be worth their weight in Reddit Gold? But which country attempts to make the highest proportion of jokes in their comment sections?

The 11 best web scraping subreddits

Thu, 12 Jun 2025 00:00:00 +0000

Web scraping is an essential skill for data analysts and developers who want to extract data from websites. However, finding reliable sources to learn and discuss web scraping techniques can be challenging. Fortunately, several subreddits on Reddit are dedicated to web scraping, data analysis, and programming-related discussion.

In this article, we'll explore the 11 best subreddits for web scraping and share why each of these subreddits might be useful for you on your web scraping journey.

What are ISP proxies?

Wed, 11 Jun 2025 00:00:00 +0000

Proxies, intermediary servers that route your internet traffic, usually fall into three categories: datacenter, residential, and ISP. By definition, ISP proxies are affiliated with an internet service provider, but in fact, it’s easier to see them as a combination of datacenter and residential proxies.

Let’s take a closer look at ISP proxies and see how they’re particularly useful for web scraping.

What are ISP Proxies?

ISP proxies are residential proxies hosted on a data center. With ISP proxies, you get the benefits of data center network speed, and the great reputation of residential IPs.

Ultimate Git and GitHub Tutorial with Examples

Tue, 10 Jun 2025 00:00:00 +0000

In software development, Git and GitHub have become essential tools for managing and collaborating on code. In this guide, we'll learn how to use Git, a powerful version control system, and GitHub, the leading platform for hosting and sharing Git repositories.

We will start by discussing Git and its most important terms. We'll cover basic Git commands and approaches and then move on to GitHub. Finally, we'll explore commands to work with GitHub repositories and answer some common questions. By the end of this article, you'll be familiar with both Git and GitHub and all the standard approaches. So, let's get started!

Serverless Web Scraping With Aws Lambda and Java

Mon, 09 Jun 2025 00:00:00 +0000

Serverless is a term referring to the execution of code inside ephemeral containers (Function As A Service, or FaaS). It is a hot topic in 2019, after the “micro-service” hype, here come the “nano-services”!

Cloud functions can be triggered by different things such as:

An HTTP call to a REST API
A job in a message queue
A log
IOT event

Cloud functions are a really good fit with web scraping tasks for many reasons. Web Scraping is I/O bound, most of the time is spent waiting for HTTP responses, so we don’t need high-end CPU servers. Cloud functions are cheap (first 1M request is free, then $0.20 per million requests) and easy to set up. Cloud functions are a good fit for parallel scraping, we can create hundreds or thousands of function at the same time for large-scale scraping.

The Best JavaScript Web Scraping Libraries

Mon, 09 Jun 2025 00:00:00 +0000

Ever need to pull data from websites – things like product details, news articles, or even just prices? Web scraping is your go-to, and luckily, JavaScript offers some nice tools for the job. Whether you're facing a simple HTML page or a dynamic interactive site, there's a library out there that can handle it.

In this guide we'll dive into the best JavaScript web scraping tools that people are actually using in 2025. For each one, you'll get: a brief overview, a code snippet to get you started, as well as pros and cons.

Mapping the Funniest US States on Reddit using AI

Sun, 08 Jun 2025 00:00:00 +0000

Reddit is a unique social media platform that works on upvotes rather than likes and followers. Needless to say, jokes are very important contributors to Reddit's upvote economy. To add to this, most users use the platform anonymously and miss no opportunity to crack a dad joke whenever they can.

In a previous article, we analyzed and ranked country subreddits for humorous comments. The USA was one of the top countries in terms of the percentage of attempted jokes. In this article, we drill down further and repeat the same analysis across the states of the USA. For each state, we obtained all the comments from the top 50 threads of this year. Then we ran the top-level comments through AI (Mistral 7B) to classify them as "joke" or "not joke", with the thread topic in context.

Web Scraping vs Web Crawling: Ultimate Guide

Sat, 07 Jun 2025 00:00:00 +0000

There are many ways that businesses and individuals can gather information about their customers and web crawling and web scraping are some of the most common approaches. You'll hear these terms used interchangeably, but they are not the same thing.

In this article, we'll go over the differences between web scraping and web crawling and how they relate to each other. We will also cover some use cases for both approaches and tools you can use.

An Automatic Bill Downloader in Java

Fri, 06 Jun 2025 00:00:00 +0000

In this article, I am going to show how to download bills (or any other file ) from a website with HtmlUnit.

I suggest you read these articles first: Introduction of how to do web scraping with Java and Autologin

Since I am hosting this blog on Digital Ocean (10$ in credit if you sign up via this link), I will show you how to write a bot to automatically download every bill you have.

What are datacenter proxies?

Thu, 05 Jun 2025 00:00:00 +0000

A datacenter proxy is a proxy service that offers quick internet access and a better user experience. As they’re not affiliated with an ISP, they will hide your real IP address, which means the website won’t be able to identify the user’s real IP address, enabling the user to access the website anonymously. That’s beneficial in a number of scenarios, like accessing all the information on a website hosted in a country whose servers may hide certain information, getting around a server block, or when you need high bandwidth without network lag.

No-code competitor monitoring with ScrapingBee and Integromat

Wed, 04 Jun 2025 00:00:00 +0000

Competitor analysis is a vital task in big or small companies. It allows you to confirm market needs by looking at what competitors are offering. At the same time, it allows you to build better products and impress potential customers by fixing what is wrong with the current options.

Of course, a company should focus on its own products. But you can’t just ignore what is happening out there. You can find amazing insights with data gathered from competitors, suppliers, customers.

Send stock prices update to Slack with Make and ScrapingBee

Tue, 03 Jun 2025 00:00:00 +0000

It is unlikely that you will always be on top of your investments if you do not study your stock's price movements. The good news is that there are plenty of online resources available to you that allow you to monitor the financial health of a company whose shares you own, and to evaluate the stock's performance.

Yahoo Finance supplies an up-to-date news feed of financial news from some of the most trusted sources online, as well as offering a comprehensive look at stocks and funds.

urllib3 vs. Requests: Which HTTP Client is Best for Python?

Tue, 03 Jun 2025 00:00:00 +0000

Python is one of the most widely used programming languages for web scraping, and a large chunk of any web scraping task is sending HTTP requests. urllib3 and Requests are the most commonly used packages for this purpose. Naturally, the next question is which one do you use?

In this blog, we briefly introduce both packages, highlighting the differences between urllib3 and Requests, and discuss which one of them is best suited for different scenarios.

Topic Analysis of US State Subreddits Using gpt-4o-mini

Mon, 02 Jun 2025 00:00:00 +0000

Ever wondered what people across the United States are talking about online? Reddit, often dubbed "the front page of the internet," offers a treasure trove of conversations, and each state has its own dedicated subreddit reflecting local interests. But what exactly are these state-based communities discussing the most?

In total, we looked at 50,947 threads from the different states of the USA. We used the “year” filter and the “top” sort on Reddit. We first made a word cloud consisting of the commonly occurring words in the thread topics. Based on this preliminary analysis, we made 8 categories, including an “others” category which we excluded from visualizations. We asked gpt-4o-mini to go over each topic and classify them into one of those. The 8 categories we used are as follows:

Shades of Success: The Trending E-commerce Colours of 2024

Sun, 01 Jun 2025 00:00:00 +0000

As consumers, we love nothing more than jumping aboard a new micro trend or aesthetic, and platforms such as Pinterest and TikTok have made it easier than ever before to keep up with all the latest trends.

Colour is at the heart of every fashion, interior and style trend, but in the fast-paced world of 2024, colour is so much more than pastel tones and monochrome palettes. It's no surprise that the likes of Dulux and Pantone release an annual 'colour of the year'.

Minimum Advertised Price Monitoring with ScrapingBee

Sat, 31 May 2025 00:00:00 +0000

Minimum Advertised Price Monitoring with ScrapingBee

To uphold their brand image and protect profits, it's crucial for manufacturers to routinely monitor the advertised prices of their products. Minimum advertised price (MAP) monitoring helps brands check whether retailers are advertising their products below the minimum price set by the brand. This can prevent retailers from competing on product price, which can lead to a harmful race to the bottom. MAP monitoring helps brands identify and enforce their MAP policies. For instance, if a brand sets a MAP of $100 for a new cosmetic product, MAP monitoring would enable the company to identify and take action against retailers who advertise it for less than $100.

What is HTTP?

Fri, 30 May 2025 00:00:00 +0000

Your browser uses it, as does your REST API. It connects you to your favorite restaurant whenever you order food online. It's built into your IoT gadget and allows you to unlock doors and adjust your living room temperature, when you are on the other side of the planet. And it's even used to occasionally tunnel other protocols - HTTP

But what exactly is HTTP? What does it do and how does it work? If you already read some of our other articles (e.g. Web Scraping with PHP), you'll have already come across some details, but today we really want to go in-depth into what HTTP is.

Are Product Hunt's featured products still online today?

Thu, 29 May 2025 00:00:00 +0000

Releasing any new product these days is a competitive business. Mountains of new products appear daily, complete with well produced intro videos with every new competitor bearing a striking resemblance to one other. But how many of the products of the past stood out from the crowd and continue to remain online today?

In this article I'll be showing how to query the Product Hunt API to collect data. We collected information from all the featured products from Product Hunts 8-year history to determine how many of them still exist online or have disappeared into the tech wilderness. Along the way we'll also discover other interesting insights into the dataset.

'JMAP (YC S10) Linux Inside is hiring': the quest for the best Hacker News title

Wed, 28 May 2025 00:00:00 +0000

Introduction

For those of you who don't know, Hacker News is a successful social news website focusing on computer science and entrepreneurship visited by more than 10m people per month (source: SimilarWeb).

Founded by Paul Graham, it works similarly to Reddit, users submit contents which can be upvoted by the community. The most upvoted content, mostly links, then reach the front-page, resulting in tens of thousands of visits for the lucky website.

A JavaScript Developer's Guide to curl

Tue, 27 May 2025 00:00:00 +0000

curl, short for client URL, is a command line tool for transferring data over various protocols, including HTTP and HTTPS. It's available on many platforms, where it's often installed by default, making it a popular tool for testing network requests, scraping the web, and downloading resources.

curl's powerful and versatile feature set combined with a simple CLI makes it a go-to choice for many developers, with many guides and lots of API documentation, including curl-based examples. It's no wonder developers would like to use curl alongside a scripting language like JavaScript for demanding workflows like web scraping.

The Java Web Scraping Handbook

Mon, 26 May 2025 00:00:00 +0000

This guide was originally written in 2018, and published here: https://www.javawebscrapinghandbook.com/. We've decided to republish it for free on our website. A PDF version is also available.

You don't have to give us your email to download to eBook, because like you, we hate that: DIRECT PDF VERSION.

Feel free to distribute it, but please include a link to the original content (this page).

Web scraping or crawling is the act of fetching data from a third party website by downloading and parsing the HTML code to extract the data you want. It can be done manually, but generally this term refers to the automated process of downloading the HTML content of a page, parsing/extracting the data, and saving it into a database for further analysis or use.

Can you get SOCKS5 for free?

Fri, 23 May 2025 00:00:00 +0000

What Is The SOCKS5 Protocol?

SOCKS is an internet protocol used for proxies, i.e. to enable a client and a server machine to communicate over the internet without knowing each other, by means of an intermediary proxy server. SOCKS5 is the most recent version of this protocol, designed to be an upgrade to its predecessors SOCKS4 and SOCKS4a. SOCKS5 offers authentication support and includes support for IPv6 and UDP.

Common Use Cases For SOCKS5 Proxies

In the world of web scraping, the most common use case of a proxy is to mask the IP address of the client making the HTTP request to the website being scraped. Proxies primarily help mask the IP address of the client from the server. This could be useful for privacy reasons, to bypass geographical restrictions, or to make requests from multiple IP addresses using multiple proxies to bypass IP-based rate limiting.

How to find elements by CSS selector in Selenium?

Fri, 23 May 2025 00:00:00 +0000

Selenium is a popular browser automation framework that is also used for scraping data using headless browsers. While using Selenium, one of the most popular things to do is use CSS selectors to select particular HTML elements to interact with or extract data from.

Using Browser Developer Tools To Find CSS Selectors

To scrape content or fill in forms using Selenium, we first need to know the CSS selector of the HTML element we'll be working with. To find the CSS selector, we need to go through the HTML structure of the web page, which could be confusing and cumbersome. Most modern browsers provide developer tools to make this easier.

How to parse a JSON file in JavaScript?

Fri, 23 May 2025 00:00:00 +0000

What Is JSON And Why Parse It?

JSON stands for "JavaScript Object Notation". It's one of the most popular formats used for storing and sharing data containing key-value pairs, which may also be nested or in a list. For many applications that work with data, including web scraping, it is important to be able to write and parse data in the JSON format.

Here is a sample JSON string:

{
 "name": "John Doe",
 "age": 32,
 "address": {
 "street": "123 Main St",
 "city": "Anytown",
 "state": "CA"
 }
}

How To Read A JSON File?

In JavaScript, you can parse a JSON string using the JSON.parse() method. A JSON file is essentially a text file containing a JSON string. Therefore, to read a JSON file, you first need to read the file as a string and then parse it into an object that contains key-value pairs.

How to use CSS Selectors in Python?

Fri, 23 May 2025 00:00:00 +0000

What Are CSS Selectors?

CSS selectors are patterns that are used to reference HTML elements, primarily for the purpose of styling them using CSS. Over the years, they've evolved into one of the key ways to select and manipulate HTML elements using in-browser JavaScript and other programming languages such as Python.

Why Use CSS Selectors in Python?

In Python, CSS selectors are primarily used to select one or more HTML elements while working with web pages, usually for scraping and browser automation.

How to wait for page to load in Playwright?

Fri, 23 May 2025 00:00:00 +0000

Websites that render using JavaScript work in many different ways. Hence, waiting for the page to load might mean different things based on what we're looking to do. Sometimes the elements we need will appear on the first render, sometimes an app shell will load first and then the content. Sometimes we may even have to interact (click or scroll). Let's look at the different methods to wait in Playwright, so you can use the one that best works for your task.

What is Screen Scraping and How To Do It With Examples

Mon, 19 May 2025 00:00:00 +0000

What is Screen Scraping?

The easiest way to get data from another program is to use a dedicated API (Application Programming Interface), but not all programs provide one. In fact, most programs don't.

If there's no API provided, you can still get data from a program by using screen scraping, which is the process of capturing data from the screen output of a program.

This can take all kinds of forms, ranging from parsing terminal output to reading text off screenshots, with the most common being classic web scraping.

Web Scraping with R Tutorial: Scraping BrickEconomy.com

Mon, 12 May 2025 00:00:00 +0000

In this tutorial we'll cover everything you need to know about web scraping using the R programming language. We'll explore the ecosystem of R packages for web scraping, build complete scrapers for real-world datasets, tackle common challenges like JavaScript rendering and pagination, and even analyze our findings with some data science magic. Let's get started!

ChatGPT Scraping - How to Vibe Scrape with ChatGPT

Fri, 09 May 2025 00:00:00 +0000

LLMs such as ChatGPT have changed how developers write, review, and test code. The biggest testament to this is the rise of the term "Vibe coding", which was coined by Andrej Karpathy in an X post. To quote the post:

How to bypass PerimeterX anti-bot system in 2025

Tue, 06 May 2025 00:00:00 +0000

Today we're continuing our adventures in the wondrous world of scraping and taking a look at how to bypass the PerimeterX anti-bot system using a few potential solutions. It's not the easiest task, but I'll try to explain what to watch out for and will cover some key details to keep in mind.

So, let's get started!

What is PerimeterX?

I'm not sure about you, but to me the name "PerimeterX" sounds like it belongs to a secret military project or some evil AI. You could imagine it being announced by an overly dramatic voice-over and accompanied by "Ride of the Valkyries." Jokes aside, PerimeterX (also known as "HUMAN" — kinda ironic, eh?) is a bot protection system used by some websites to detect and block automated traffic, including your scraping tools. You'll mostly find PerimeterX on large high-traffic sites like e-commerce, ticketing, login forms and any place where bots cause (financial) damage.

Playwright MCP - Scraping Smithery MCP database Tutorial with Cursor

Mon, 28 Apr 2025 00:00:00 +0000

Humanity got itself a huge upgrade by embracing tool use a few million years ago and now AI is getting that upgrade. AI is now able to use various tools for you. For example, it can search the web, turn on your living room lamps, play Pokémon and of course use browsers to scrape data.

A critical link in the interface between AI and software tools is the Model Context Protocol (MCP). It is an open protocol that defines how tools can expose their data and functionality to be used by AI models. It was introduced by Anthropic in November 2024 and now the internet is full of MCP servers that can enable AI to do various things. Recently, OpenAI and Google have announced that they will be supporting MCP for ChatGPT and Gemini respectively. So it looks like MCP is going to be the industry standard.

Web Scraping with Perl

Mon, 14 Apr 2025 00:00:00 +0000

Web scraping is a technique for retrieving data from web pages. While one could certainly load any site in their browser and copy-paste the relevant data manually, this hardly scales and so web scraping is a task destined for automation. If you are curious why one would scrape the web(/blog/what-is-web-scraping/#web-scraping-use-cases), you'll find a myriad of reasons for that:

Generating leads for marketing
Monitoring prices on a page (and purchase when the price drops low)
Academic research
Arbitrage betting

Perl is universally considered the "Swiss Army knife of programming" and there is a good reason for that, as it particularly excels in text processing and handling of textual input of any sort. This makes it a perfect companion for web scraping, which is inherently text-centric.

Web Scraping with Ruby

Mon, 07 Apr 2025 00:00:00 +0000

In this tutorial we're diving into the world of web scraping with Ruby. We'll explore powerful Gems like Faraday for HTTP requests, Nokogiri for parsing HTML, and browser automation with Selenium and Capybara. Along the way, we'll scrape real websites with some example scripts, handle dynamic Javascript content and even run headless browsers in parallel.

By the end of this tutorial, you'll be equipped with the knowledge and practical patterns needed to start scraping data from websites — whether for fun, research, or building something cool.

Web Scraping with Scala - Easily Scrape and Parse HTML

Mon, 24 Mar 2025 00:00:00 +0000

This tutorial explains how to use three technologies for web scraping with Scala. The article first explains how to scrape a static HTML page with Scala using jsoup and Scala Scraper. Then, it explains how to scrape a dynamic HTML website with Scala using Selenium.

BrowserUse: How to use AI Browser Automation to Scrape

Mon, 17 Mar 2025 00:00:00 +0000

AI agents, AI agents everywhere. This is one of the most popular and quickly evolving technologies out there. I'm not sure about you, but to me it seems like everyone is trying to use AI for literally everything: collecting data, writing letters, booking hotels, and even shopping. While I still prefer doing many of these things manually, automating boring tasks seems really tempting. Thus, in this article, we're going to see how to automate browser interactions with the help of BrowserUse.

Web Scraping in C++ with libxml2 and libcurl

Tue, 11 Mar 2025 00:00:00 +0000

Web scraping is one of the rather important parts when it comes automated data extraction of web content. While languages like Python are commonly used, C++ offers significant advantages in performance and control. With its low-level memory management, speed, and ability to handle large-scale data efficiently, it is an excellent choice for web scraping tasks that demand high performance.

In this article, we shall take a look at the advantages of developing our own custom web scraper in C++ and what its speed, resource efficiency, and scalability for complex scraping operations can bring to the table. You’ll learn how to implement a web scraper with the libcurl and libxml2 libraries.

How to Scrape With Camoufox to Bypass Antibot Technology

Mon, 03 Mar 2025 00:00:00 +0000

In a previous blog, we evaluated popular browser automation frameworks and patches developed for them to bypass CreepJS, which is a browser fingerprinting tool that can detect headless browsers and stealth plugins. Of all the tools we tried, we found that Camoufox scored the best, being indistinguishable from a real, human-operated browser. In this blog, we’ll see what it is, how it works, and try using it for some web scraping tasks.

Web Scraping in Rust with Reqwest, Scraper and Tokio

Mon, 24 Feb 2025 00:00:00 +0000

In this Rust tutorial you'll learn how to create a basic web scraper by scraping the top ten movies list from IMDb. Rust is a language known for its speed and safety and we'll try two approaches: blocking IO and asynchronous IO with tokio.

Implementing a Web Scraper in Rust

You’re going to set up a fully functioning web scraper in Rust. Your target for scraping will be IMDb, a database of movies, TV series, and other media.

Best 10 Java Web Scraping Libraries

Mon, 17 Feb 2025 00:00:00 +0000

In this article, I will show you the most popular Java web scraping libraries and help you choose the right one. Web scraping is the process of extracting data from websites. At first sight, you might think that all you need is a standard HTTP client and basic programming skills, right?

In theory, yes, but quickly, you will face challenges like session handling, cookies, dynamically loaded content and JavaScript execution, and even anti-scraping measures (for example, CAPTCHA, IP blocking, and rate limiting).

How to make API calls using Python

Mon, 17 Feb 2025 00:00:00 +0000

This tutorial will show you how to make HTTP API calls using Python. There are many ways to skin a cat and there are multiple methods for making API calls in Python, but today we'll be demonstrating the requests library, making API calls to the hugely popular OpenAI ChatGPT API.

We'll give you a demo of the more pragmatic approach and experiment with their dedicated Software Development Kit (SDK) so you can easily integrate AI into your project. We'll also explain how to make API requests to our Web Scraping API which will give you the power to pull data from any website into your project.

How to Web Scrape Amazon with Python

Wed, 12 Feb 2025 00:00:00 +0000

Scraping Amazon can be tricky. I know the struggle. The site changes often, it has built-in protections and isn't exactly fond of being scraped. If you've ever tried going down this road, you've probably ran into roadblocks in the form of CAPTCHAs or empty responses. This tutorial will show you how to scrape Amazon shopping results step by step, bypassing anti-scraping measures with code examples.

We'll demonstrate how to extract product details like names, prices, and links, and how to save this data into a CSV file easily. We'll also learn how to deal with common issues using proxies and other advanced tools. By the end you'll have a working Python script and full understanding of how all this ties together.

Web Scraping with PHP Tutorial with Example Scripts (2025)

Mon, 10 Feb 2025 00:00:00 +0000

You might have seen one of our other tutorials on how to scrape websites, for example with Ruby, JavaScript or Python, and wondered: what about the most widely used server-side programming language for websites, which, at the same time, is the one of the most dreaded? Wonder no more - today it's time for PHP 🥳!

Believe it or not, PHP and web scraping have much in common: just like PHP, web scraping can be used either in a quick and dirty way or in a more elaborate fashion and supported with the help of additional tools and services.

How to Scrape Job Postings with a Free AI Job Board Scraper

Mon, 03 Feb 2025 00:00:00 +0000

The Job market is a fiercely competitive place and getting an edge in your search can mean the difference between success and failure, so many tech-savvy Job seekers turn to web-scraping Job listings to get ahead of the competition, enabling them to see new relevant Jobs as soon as they hit the market.

Scraping Job listings can be an invaluable tool for finding your next role and in this tutorial, we’ll teach you how to use our AI-powered Web Scraping API to harvest Job vacancies from any Job board with ease.

How to Bypass CreepJS and Spoof Browser Fingerprinting

Mon, 27 Jan 2025 00:00:00 +0000

CreepJS is an open-source project designed to demonstrate vulnerabilities and leaks in extensions or browsers that users use to avoid being fingerprinted. It’s one of the newest projects in the browser fingerprinting scene, and it uses an advanced combination of techniques such as JavaScript tampering detection and finding inconsistencies between the detected user agent and the expected feature set.

In this tutorial, we’ll see how the most popular headless browsers stack up against each other in an all-out battle to pass CreepJS’s “Headless” and “Stealth” detection scores.

What is Web Scraping? How to Scrape Data From Any Website

Mon, 13 Jan 2025 00:00:00 +0000

Web Scraping can be one of the most challenging things to do on the internet. In this tutorial we’ll show you how to master Web Scraping and teach you how to extract data from any website at scale. We’ll give you prewritten code to get you started scraping data with ease.

What is Web Scraping?

Web scraping is the process of automatically extracting data from a website’s HTML. This can be done at scale to visit every page on the website and download the valuable data you need, storing it in a database for later use. For example, you could regularly scrape or extract all the product prices from an e-commerce store to track changes in price so your business can change the price of your products accordingly to compete.

Scrapy Playwright Tutorial: How to Scrape Dynamic Websites

Mon, 06 Jan 2025 00:00:00 +0000

Playwright for Scrapy enables you to scrape javascript heavy dynamic websites at scale, with advanced web scraping features out of the box.

In this tutorial, we’ll show you the ins and outs of scraping using this popular browser automation library that was originally invented by Microsoft, combining it with Scrapy to extract the content you need with ease.

We’ll cover jobs to be done such as setting up your Python environment, inputting and submitting form data, all the way through to dealing with infinite scroll and scraping multiple pages.

Introduction to Web Scraping With Java

Mon, 25 Nov 2024 00:00:00 +0000

Is there a website from where you'd like to regularly scrape data in a structured fashion, but that site does not offer a standardised API, such as a JSON REST interface yet? Don't fret, web scraping with Java comes to the rescue.

How to scrape websites with cloudscraper (python example)

Thu, 07 Mar 2024 00:00:00 +0000

Over 7.59 million active websites use Cloudflare. There's a high chance that the website you intend to scrape might be protected by it. Websites protected by services like Cloudflare can be challenging to scrape due to the various anti-bot measures they implement. If you've tried scraping such websites, you're likely already aware of the difficulty in bypassing Cloudflare's bot detection system.

In this article, you’ll learn how to use Cloudscraper, an open-source Python library, to scrape Cloudflare-protected websites. You’ll learn about some of the advanced features of Cloudscraper, such as CAPTCHA bypass and user-agent manipulation. Furthermore, we’ll discuss the limitations of Cloudscraper and suggest the most effective alternative method.

How to scrape emails from any website

Mon, 04 Mar 2024 00:00:00 +0000

With the seemingly endless variety of platforms for instant communication these days (Slack, Whatsapp, RCS, and not to forget social media) one could easily forget about the original type of electronic communication - email. Despite regular claims that a new technology will replace email, it continues to thrive and the number of messages keeps going up by about four percent every year. For that reason it may not be surprising that email is a crucial tool for most businesses. Be that for keeping in touch with existing customers, or reaching out to new ones. When done right, email campaigns can prove immensely effective.

How to Web Scrape Airbnb data (Easy Working Code Example)

Fri, 01 Mar 2024 00:00:00 +0000

Imagine searching for the perfect vacation rental, only to be overwhelmed by options. Or perhaps you're a host curious about how your Airbnb listing compares to others in your area.

Airbnb is a popular online marketplace that connects people seeking unique accommodations with hosts offering their homes. But with so many listings, finding the ideal one can be a challenge.

This article will explore the easiest way to scrape Airbnb listings using Python, BeautifulSoup, and ScrapingBee.

How to scrape websites with Google Sheets

Wed, 20 Dec 2023 00:00:00 +0000

Using Google Sheets for Scraping

Web scraping, the process of extracting data from websites, has evolved into an indispensable tool for all kinds of industries, from market research to content aggregation. While programming languages like Python are often the go-to choice for scraping, a surprisingly efficient and accessible alternative is Google Sheets.

Google Sheets is primarily known as a versatile spreadsheet application for creating, editing, and organizing data. However, it also offers some powerful web scraping capabilities that make it an attractive option, especially for individuals and organizations with minimal coding experience. With functions such as IMPORTXML and IMPORTHTML that allow you to extract data from websites without writing any code, you can use Google Sheets as a web scraping tool.

Guide to Scraping E-commerce Websites

Mon, 11 Dec 2023 00:00:00 +0000

Scraping e-commerce websites has become increasingly important for companies to gain a competitive edge in the digital marketplace. It provides access to vast amounts of product data quickly and efficiently. These sites often feature a multitude of products, prices, and customer reviews that can be difficult to review manually. When the data extraction process is automated, businesses can save time and resources while obtaining comprehensive and up-to-date information about their competitors' offerings, pricing strategies, and customer sentiment.

Web Scraping with Kotlin

Mon, 03 Apr 2023 00:00:00 +0000

Kotlin is a modern, cross-platform programming language. It's open source and developed by JetBrains, the team behind IDEs IntelliJ, WebStorm, and PyCharm. Kotlin is JVM-compatible and fully interoperable with Java. It has many features, including null safety, extension functions, higher-order functions, and coroutines.

Released in 2011, the language has quickly risen to prominence and is currently used by over 5 million developers across mobile, desktop, and backend. In 2017, Google made Kotlin a first-class language for developing Android apps.

Can I use XPath selectors in DOM Crawler?

Fri, 24 Feb 2023 09:10:27 +0000

Yes, you can use XPath selectors in DOM Crawler. Here is some sample code that uses Guzzle to load the ScrapingBee website and then uses DOM Crawler's filterXPath method to extract and print the text content of the h1 tag:

use Symfony\Component\DomCrawler\Crawler;
use GuzzleHttp\Client;

// Create a client to make the HTTP request
$client = new \GuzzleHttp\Client();
$response = $client->get('https://www.scrapingbee.com/');
$html = (string) $response->getBody();

// Load the HTML document
$crawler = new Crawler($html);

// Find the first h1 element on the page
$h1 = $crawler->filterXPath('//h1[1]');

// Get the text content of the h1 element
$text = $h1->text();

// Print the text content
echo $text; 

// Output: 
// "Tired of getting blocked while scraping the web?"

If you do not want to use Guzzle, take a look at this sample code that directly passes in an HTML string:

Does Guzzle use cURL?

Fri, 24 Feb 2023 09:10:27 +0000

Yes, Guzzle uses cURL as one of the underlying HTTP transport adapters. However, Guzzle supports multiple adapters, including cURL, PHP stream, and sockets, which can be used interchangeably depending on your needs. By default, Guzzle uses cURL as the preferred adapter, as it provides a robust and feature-rich API for sending HTTP requests and handling responses. However, Guzzle also provides an abstraction layer that allows developers to switch between adapters seamlessly, without having to modify their application code.

Handle Guzzle exception and get HTTP body?

Fri, 24 Feb 2023 09:10:27 +0000

You can easily handle Guzzle exceptions and get the HTTP body of the response (if it has any) by catching RequestException. This is a higher-level exception that covers BadResponseException, TooManyRedirectsException, and a few related exceptions.

Here is how the exceptions in Guzzle depend on each other:

. \RuntimeException
└── TransferException (implements GuzzleException)
 ├── ConnectException (implements NetworkExceptionInterface)
 └── RequestException
 ├── BadResponseException
 │ ├── ServerException
 │ └── ClientException
 └── TooManyRedirectsException

Here is an example of how to handle the RequestException in Guzzle and get the HTTP body (if there is one):

How do I do HTTP basic authentication with Guzzle?

Fri, 24 Feb 2023 09:10:27 +0000

You can easily do HTTP basic authentication with Guzzle by passing in an auth array with the username and password as part of the options while creating the Client object. Guzzle will make sure to use these authentication credentials with all the follow-up requests made by the $client.

Here is some sample code that uses an authentication endpoint at HTTP Bin to demonstrate this:

use GuzzleHttp\Client;

$client = new Client([
 'auth' => ['user', 'passwd']
]);

$response = $client->get('https://httpbin.org/basic-auth/user/passwd');
$body = $response->getBody();
echo $response->getStatusCode() . PHP_EOL;
echo $body;

// Output:
// 200
// {
// "authenticated": true,
// "user": "user"
// }

Alternatively, you can specify the auth credentials on a per-request basis as well:

How do you handle client error in Guzzle?

Fri, 24 Feb 2023 09:10:27 +0000

You can easily handle client errors in Guzzle by catching the thrown exceptions. You can either catch the RequestException and it should cover most of the exceptions or you can catch the more specific ClientException which covers only the client exceptions such as 4xx status codes.

Here is an example of some code that results in a 404 Not Found exception that is handled by catching ClientException in Guzzle:

use GuzzleHttp\Client;
use GuzzleHttp\Exception\ClientException;

$client = new Client();

try {
 $response = $client->get('https://httpbin.org/status/404');
 // Process response normally...
} catch (ClientException $e) {
 // An exception was raised but there is an HTTP response body
 // with the exception (in case of 404 and similar errors)
 $response = $e->getResponse();
 $responseBodyAsString = $response->getBody()->getContents();
 echo $response->getStatusCode() . PHP_EOL;
 echo $responseBodyAsString;
}

// Output:
// 404

You can read more about various exceptions thrown by Guzzle in the official docs.

How to find all links using DOM Crawler and PHP?

Fri, 24 Feb 2023 09:10:27 +0000

You can find all links using DOM Crawler and PHP by making use of either the filter or the filterXPath method. Below, you can find two code samples that demonstrate how to use either of these methods. The code uses Guzzle to load the ScrapingBee website so you may want to install that as well using Composer.

This example code uses the filter method:

use Symfony\Component\DomCrawler\Crawler;
use GuzzleHttp\Client;

// Create a client to make the HTTP request
$client = new \GuzzleHttp\Client();
$response = $client->get('https://www.scrapingbee.com/');
$html = (string) $response->getBody();

// Load the HTML document
$crawler = new Crawler($html);

// Find all links on the page
$links = $crawler->filter('a');

// Loop over the links and print their href attributes
foreach ($links as $link) {
 echo $link->getAttribute('href') . PHP_EOL;
}

// Output:
// /
// https://app.scrapingbee.com/account/login
// https://app.scrapingbee.com/account/register
// /#pricing
// /#faq
// /blog/
// #
// /features/screenshot/
// /features/google/
// ...

This example code uses filterXPath method:

How to find elements without specific attributes in DOM Crawler?

Fri, 24 Feb 2023 09:10:27 +0000

You have two options to find elements without specific attributes in DOM Crawler. The first option uses the filterXPath method with an XPath selector that includes a negative predicate. And the second option uses the filter method with the :not CSS pseudo-class and the attribute selector.

Here is some sample code that showcases the filterXPath options and finds all img tags that do not have an alt attribute:

use Symfony\Component\DomCrawler\Crawler;

$html = <<<EOD
<!DOCTYPE html>
<html>
<head>
	<title>Example Page</title>
</head>
<body>
	<h1>Hello, world!</h1>
	<p>This is an example page.</p>
	<img src="logo.png" />
 <img src="header.png" alt="header"/>
 <img src="yasoob.png" alt="profile picture"/>
</body>
</html>
EOD;

// Load the HTML document
$crawler = new Crawler($html);

// Find all img elements without an alt attribute
$imagesWithoutAlt = $crawler->filterXPath('//img[not(@alt)]');

// Loop over the images and print their src attributes
foreach ($imagesWithoutAlt as $image) {
 echo $image->getAttribute('src') . PHP_EOL;
}

// Output:
// logo.png

Here is some sample code that uses the filter method with :not CSS pseudo-class instead:

How to find HTML elements by attribute using DOM Crawler?

Fri, 24 Feb 2023 09:10:27 +0000

You can find HTML elements by attribute using DOM Crawler by utilizing the filterXPath method with an XPath selector that includes an attribute selector. Here's an example that uses the filterXPath method with an XPath selector to find all input elements on ScrapingBee's login page that have a type attribute equal to "email":

use Symfony\Component\DomCrawler\Crawler;
use GuzzleHttp\Client;

// Create a client to make the HTTP request
$client = new \GuzzleHttp\Client();
$response = $client->get('https://app.scrapingbee.com/account/login');
$html = (string) $response->getBody();

// Load the HTML document
$crawler = new Crawler($html);

// Find all input elements with a type attribute equal to "email"
$textInputs = $crawler->filterXPath('//input[@type="email"]');

// Loop over the inputs and print their values
foreach ($textInputs as $input) {
 echo $input->getAttribute('placeholder') . PHP_EOL;
}

// Output:
// Enter your email

Note: This example uses Guzzle so you may have to install it.

How to find HTML elements by class with DOM Crawler?

Fri, 24 Feb 2023 09:10:27 +0000

You can find HTML elements by class with DOM Crawler by making use of the filter method with a [CSS selector](CSS selectors) that includes the class name. Here is some sample code that uses Guzzle to load ScrapingBee's homepage and then uses the filter method to extract the tag with the class of mb-[33px]:

use Symfony\Component\DomCrawler\Crawler;
use GuzzleHttp\Client;

// Create a client to make the HTTP request
$client = new \GuzzleHttp\Client();
$response = $client->get('https://scrapingbee.com/');
$html = (string) $response->getBody();

// Load the HTML document
$crawler = new Crawler($html);

// Find all elements with the class "mb-[33px]"
$h1Tag = $crawler->filter('.mb-[33px]');

// Loop over the elements and print their text content
foreach ($h1Tag as $element) {
 echo $element->textContent . PHP_EOL;
}

// Output:
// Tired of getting blocked while scraping the web?
// Try ScrapingBee for Free

Note: This example uses Guzzle so you may have to install it.

How to find HTML elements by multiple tags with DOM Crawler?

Fri, 24 Feb 2023 09:10:27 +0000

You can find HTML elements by multiple tags with DOM Crawler by pairing the filter method with a CSS selector that includes multiple tag names separated by commas. Here's an example that loads ScrapingBee's homepage using Guzzle and then prints the text of all h1 and h2 tags using Dom Crawler:

use Symfony\Component\DomCrawler\Crawler;
use GuzzleHttp\Client;

// Create a client to make the HTTP request
$client = new \GuzzleHttp\Client();
$response = $client->get('https://scrapingbee.com/');
$html = (string) $response->getBody();

// Load the HTML document
$crawler = new Crawler($html);

// Find all h1 and h2 headings on the page
$headings = $crawler->filter('h1, h2');

// Loop over the headings and print their text content
foreach ($headings as $element) {
 echo $element->textContent . PHP_EOL;
}

// Output:
// Tired of getting blocked while scraping the web?
// Render your web page as if it were a real browser.
// Render JavaScript to scrape any website.
// Rotate proxies to bypass rate limiting.
// Simple, transparent pricing.
// Developers are asking...
// Who are we?
// Contact us
// Ready to get started?

How to find sibling HTML nodes using DOM Crawler and PHP?

Fri, 24 Feb 2023 09:10:27 +0000

You can find sibling HTML nodes using DOM Crawler and PHP by utilizing the siblings method of a Crawler object. Here is some sample code that extracts the first p node, then extracts its siblings using the siblings method, and finally loops over these sibling nodes and prints their text content:

use Symfony\Component\DomCrawler\Crawler;

$html = <<<EOD
 <div>
 <p>This is the first paragraph.</p>
 <p>This is the second paragraph.</p>
 <p>This is the third paragraph.</p>
 </div>
EOD;

// Load the HTML document
$crawler = new Crawler($html);

// Find the first p element
$pElement = $crawler->filter('p')->first();

// Find all sibling elements of the p element
$siblings = $pElement->siblings();

// Loop over the siblings and print their text content
foreach ($siblings as $sibling) {
 echo $sibling->textContent . PHP_EOL;
}

// Output:
// This is the second paragraph.
// This is the third paragraph.

How to ignore SSL certificate error with Guzzle?

Fri, 24 Feb 2023 09:10:27 +0000

You can easily ignore SSL certificate errors with Guzzle by setting the verify option to false while creating a new Guzzle Client object.

Here is some sample code that creates a new Guzzle client with verify set to false:

use GuzzleHttp\Client;

$client = new Client(['verify' => false]);

$response = $client->get('https://example.com/');

You can read more about the verify option in the official docs.

Do keep in mind that disabling SSL verification can compromise security and should be used with caution. It is generally recommended to only disable SSL verification for testing or development purposes and to enable it in production.

How to scrape tables with DOM Crawler?

Fri, 24 Feb 2023 09:10:27 +0000

You can scrape tables with DOM Crawler by combining the regular CSS selectors with the filter and each methods to iterate over the rows and cells of the table.

Here is some sample code that demonstrates how to scrape a simple HTML table using DOM Crawler:

use Symfony\Component\DomCrawler\Crawler;

$html = <<<EOD
 <table>
 <tr>
 <th>Name</th>
 <th>Age</th>
 <th>Occupation</th>
 </tr>
 <tr>
 <td>Yasoob</td>
 <td>35</td>
 <td>Software Engineer</td>
 </tr>
 <tr>
 <td>Pierre</td>
 <td>28</td>
 <td>Product Manager</td>
 </tr>
 </table>
EOD;

// Load the HTML document
$crawler = new Crawler($html);

// Find the table element
$table = $crawler->filter('table')->first();

// Loop over the rows of the table
$table->filter('tr')->each(function ($row, $i) {
 // Loop over the columns of the row
 $row->filter('td')->each(function ($column, $j) {
 // Print the text content of the column
 echo $column->text() . PHP_EOL;
 });
});

// Output:
// Yasoob
// 35
// Software Engineer
// Pierre
// 28
// Product Manager

How to select values between two nodes in DOM Crawler and PHP?

Fri, 24 Feb 2023 09:10:27 +0000

You can select values between two nodes in DOM Crawler by using the filterXPath method with an XPath expression that selects the nodes between the two nodes you want to use as anchors.

Here is some sample code that prints the text content of all the nodes between the h1 and h2 nodes:

use Symfony\Component\DomCrawler\Crawler;

$html = <<<EOD
 <div>
 <h1>Header 1</h1>
 <p>Paragraph 1</p>
 <p>Paragraph 2</p>
 <h2>Header 2</h2>
 <p>Paragraph 3</p>
 </div>
EOD;

// Load the HTML document
$crawler = new Crawler($html);

// Find all nodes between the h1 and h2 elements
$nodesBetweenHeadings = $crawler->filterXPath('//h1/
following-sibling::h2/
	preceding-sibling::*[
		preceding-sibling::h1
	]');

// Loop over the nodes and print their text content
foreach ($nodesBetweenHeadings as $node) {
 echo $node->textContent . PHP_EOL;
}

The XPath expression used above can be read like this:

How to send a POST request in JSON with Guzzle?

Fri, 24 Feb 2023 09:10:27 +0000

You can send a POST request with JSON data in Guzzle by passing in the JSON data as an array of key-value pairs via the json option.

Here is some sample code that sends a request to HTTP Bin with some sample JSON data:

use GuzzleHttp\Client;

$client = new Client();

$response = $client->post('https://httpbin.org/post', [
 "json" => [
 'key1' => 'value1',
 'key2' => 'value2'
 ]
]);

echo $response->getBody();

// Output:
// {
// "args": {},
// "data": "{\"key1\":\"value1\",\"key2\":\"value2\"}",
// "files": {},
// "form": {},
// "headers": {
// "Content-Length": "33",
// "Content-Type": "application/json",
// "Host": "httpbin.org",
// "User-Agent": "GuzzleHttp/7",
// "X-Amzn-Trace-Id": "Root=1-63fa252d-60bf3c1b2258ff5903bdd116"
// },
// "json": {
// "key1": "value1",
// "key2": "value2"
// },
// "origin": "119.73.117.169",
// "url": "https://httpbin.org/post"
// }

You can read more about it in the official Guzzle docs.

How to use proxy with authentication with Guzzle?

Fri, 24 Feb 2023 09:10:27 +0000

You can use an authenticated proxy with Guzzle very easily. You just need to pass in a proxy option when either creating a new Client object or when making the actual request. If the proxy uses authentication, just include the authentication options as part of the proxy string.

Here is what a proxy string with authentication parameters will look like:

http://username:password@proxyendpoint.com:port

Make sure to replace the username, password, proxyendpoint.com, and port with the required values based on the proxy you are using.

Is Guzzle a built-in PHP library?

Fri, 24 Feb 2023 09:10:27 +0000

No, Guzzle is not a built-in PHP library. It is a third-party library that needs to be installed separately.

To use Guzzle in your PHP application, you need to first install it using a package manager such as Composer, which is the recommended way of managing dependencies in PHP projects. Once you have installed Guzzle, you can then include it in your PHP code and use its API to send HTTP requests and handle responses.

Is PHP Guzzle deprecated?

Fri, 24 Feb 2023 09:10:27 +0000

No, Guzzle is not deprecated. It is still actively maintained and supported by the developers.

Although there have been some changes in the PHP ecosystem in recent years, such as the introduction of the PSR-7 HTTP message interfaces, Guzzle has adapted to these changes and continues to provide a modern and flexible API for working with HTTP.

If you don't know what Guzzle is, it is a popular PHP library for sending HTTP requests and handling responses, and it is widely used in many PHP projects. The library provides a robust and feature-rich API for interacting with HTTP services, making it an essential tool for PHP developers who need to work with web APIs or other HTTP-based services.

What is Guzzle used for in PHP?

Fri, 24 Feb 2023 09:10:27 +0000

Guzzle is a popular PHP library for sending HTTP requests and handling responses. It provides a flexible and feature-rich API for working with HTTP services and APIs, making it a valuable tool for many PHP developers.

Here are some of the main use cases for Guzzle in PHP:

Sending HTTP requests: Guzzle allows you to easily send HTTP requests using a variety of HTTP methods (GET, POST, PUT, DELETE, etc.) and set headers, query parameters, request bodies, and other options.
Handling HTTP responses: Guzzle provides a powerful and flexible API for handling HTTP responses, including support for response headers, status codes, response bodies, and error handling.
Working with web APIs: Guzzle is often used to interact with web APIs, allowing you to easily consume and manipulate data from a remote API in your PHP application.
Testing HTTP services: Guzzle can also be used for testing HTTP services and APIs, providing a convenient and flexible way to write automated tests for your application's HTTP interactions.

If you are working with any sort of remote HTTP requests in your PHP application, chances are that you will end up using Guzzle.

Can I use XPath selectors in Cheerio?

Thu, 23 Feb 2023 09:10:27 +0000

No, you can not use XPath selectors in Cheerio. According to these GitHub issues, there is no plan to support XPaths in Cheerio.

However, if you simply want to work with XML documents and parse those using Cheerio, it is possible. Here is some sample code for parsing XML using Cheerio.

const cheerio = require('cheerio');
const xml = `
 <bookstore>
 <book category="web">
 <title lang="en">Practical Python Projects</title>
 <author>Yasoob Khalid</author>
 <year>2022</year>
 <price>39.95</price>
 </book>
 <book category="web">
 <title lang="en">Intermediate Python</title>
 <author>Yasoob Khalid</author>
 <year>2018</year>
 <price>29.99</price>
 </book>
 </bookstore>
`;

// Load the XML document as a Cheerio object
const $ = cheerio.load(xml, { xml: true });

// Select all book titles 
const titles = $('book > title');

// Print the text content of each title
titles.each((i, title) => {
 console.log($(title).text());
});

// Output:
// Practical Python Projects
// Intermediate Python

How to find elements without specific attributes in Cheerio?

Thu, 23 Feb 2023 09:10:27 +0000

You can find elements without specific attributes in Cheerio by using the :not CSS pseudo-class and the attribute selector.

Here's an example that demonstrates how to find all div elements without a class attribute using Cheerio:

const cheerio = require('cheerio');
const html = `
 <div class="content">This div has a class attribute</div>
 <div>This div does not have a class attribute</div>
 <div class="footer">This div also has a class attribute</div>
`;

// Load the HTML content into a Cheerio object
const $ = cheerio.load(html);

// Find all div elements without a class attribute using the :not pseudo-class and the attribute selector
const divsWithoutClass = $('div:not([class])');

// Iterate over each div element without a class attribute and print its text content
divsWithoutClass.each((i, div) => {
 console.log($(div).text());
});

// Output: 
// This div does not have a class attribute

How to find HTML elements by attribute using Cheerio?

Thu, 23 Feb 2023 09:10:27 +0000

You can find HTML elements by attribute in Cheerio using the attribute selector.

Here's some sample code that demonstrates how to find all div elements with a data-attribute of "example" using Cheerio:

const cheerio = require('cheerio');
const html = `
 <div data-example="1">This div has a data-example attribute with a value of 1</div>
 <div data-example="2">This div has a data-example attribute with a value of 2</div>
 <div>This div does not have a data-example attribute</div>
`;

// Load the HTML content into a Cheerio object
const $ = cheerio.load(html);

// Find all div elements with a data-example attribute of "1" using the attribute selector
const divsWithAttribute = $('div[data-example="1"]');

// Iterate over each div element with a data-example attribute of "1" and print its text content
divsWithAttribute.each((i, div) => {
 console.log($(div).text());
});

// Output: 
// This div has a data-example attribute with a value of 1

How to find HTML elements by class with Cheerio?

Thu, 23 Feb 2023 09:10:27 +0000

You can find HTML elements by class in Cheerio by using the class selector.

Here's some sample code that demonstrates how to find all div elements with a class of example using Cheerio:

const cheerio = require('cheerio');
const html = `
 <div class="example">This div has a class of example</div>
 <div class="example">This div also has a class of example</div>
 <div>This div does not have a class of example</div>
`;

// Load the HTML content into a Cheerio object
const $ = cheerio.load(html);

// Find all div elements with a class of "example" using the class selector
const divsWithClass = $('div.example');

// Iterate over each div element with a class of "example" and print its text content
divsWithClass.each((i, div) => {
 console.log($(div).text());
});

// Output: 
// This div has a class of example
// This div also has a class of example

How to find HTML elements by multiple tags with Cheerio?

Thu, 23 Feb 2023 09:10:27 +0000

You can find HTML elements by multiple tags in Cheerio by separating the tag selectors with a ,.

Here's some sample code that demonstrates how to find all div and span elements using Cheerio:

const cheerio = require('cheerio');
const html = `
 <div>This is a div element</div>
 <span>This is a span element</span>
 <div>This is another div element</div>
`;

// Load the HTML content into a Cheerio object
const $ = cheerio.load(html);

// Find all div and span elements
const divsAndSpans = $('div, span');

// Iterate over each div and span element and print its text content
divsAndSpans.each((i, element) => {
 console.log($(element).text());
});

// Output:
// This is a div element
// This is a span element
// This is another div element

How to find sibling HTML nodes using Cheerio and NodeJS?

Thu, 23 Feb 2023 09:10:27 +0000

You can find sibling HTML nodes using Cheerio and Node.js by utilizing the siblings method of a Cheerio object.

Here's some sample code that demonstrates how to find all sibling elements of a given element using Cheerio:

const cheerio = require('cheerio');
const html = `
 <div>
 <p>This is the first paragraph.</p>
 <p>This is the second paragraph.</p>
 <p>This is the third paragraph.</p>
 </div>
`;

// Load the HTML content into a Cheerio object
const $ = cheerio.load(html);

// Select the second paragraph element
const secondParagraph = $('p:nth-of-type(2)');

// Find all sibling elements of the second paragraph using the siblings method
const siblingElements = secondParagraph.siblings();

// Iterate over each sibling element and print its text content
siblingElements.each((i, element) => {
 console.log($(element).text());
});

// Output:
// This is the first paragraph.
// This is the third paragraph.

Note: p:nth-of-type(2) is used to select the second paragraph element. You can replace it with any other appropriate selector.

How to scrape tables with Cheerio?

Thu, 23 Feb 2023 09:10:27 +0000

You can scrape tables with Cheerio by combining the regular CSS selectors with the find and each methods to iterate over the rows and cells of the table.

Here's some sample code that demonstrates how to scrape a simple HTML table using Cheerio:

const cheerio = require('cheerio');
const html = `
 <table>
 <tr>
 <th>Name</th>
 <th>Age</th>
 <th>Occupation</th>
 </tr>
 <tr>
 <td>Yasoob</td>
 <td>35</td>
 <td>Software Engineer</td>
 </tr>
 <tr>
 <td>Pierre</td>
 <td>28</td>
 <td>Product Manager</td>
 </tr>
 </table>
`;

// Load the HTML content into a Cheerio object
const $ = cheerio.load(html);

// Select the table element
const table = $('table');

// Initialize an empty array to store the table data
const tableData = [];

// Iterate over each row of the table using the find and each methods
table.find('tr').each((i, row) => {
 // Initialize an empty object to store the row data
 const rowData = {};

 // Iterate over each cell of the row using the find and each methods
 $(row).find('td, th').each((j, cell) => {
 // Add the cell data to the row data object
 rowData[$(cell).text()] = j;
 });

 // Add the row data to the table data array
 tableData.push(rowData);
});

// Print the table data
console.log(tableData);

// Output:
// [
// { Name: 0, Age: 1, Occupation: 2 },
// { '35': 1, Yasoob: 0, 'Software Engineer': 2 },
// { '28': 1, Pierre: 0, 'Product Manager': 2 }
// ]

How to select values between two nodes in Cheerio and Node.js?

Thu, 23 Feb 2023 09:10:27 +0000

You can select values between two nodes in Cheerio and Node.js by making use of a combination of the nextUntil and map methods to iterate over the elements between the two nodes and extract the desired values.

Here's an example that demonstrates how to select values between two nodes in a simple HTML structure using Cheerio:

const cheerio = require('cheerio');
const html = `
 <div>
 <h1>Header 1</h1>
 <p>Paragraph 1</p>
 <p>Paragraph 2</p>
 <h2>Header 2</h2>
 <p>Paragraph 3</p>
 </div>
`;

// Load the HTML content into a Cheerio object
const $ = cheerio.load(html);

// Select the first and second nodes using the CSS selector
const startNode = $('h1');
const endNode = $('h2');

// Use the nextUntil method to select all elements between the start and end nodes
const betweenNodes = startNode.nextUntil(endNode);

// Use the map method to extract the text content of the elements
const valuesBetweenNodes = betweenNodes.map((i, el) => $(el).text()).get();

// Print the selected values
console.log(valuesBetweenNodes);

// Output:
// [ 'Paragraph 1', 'Paragraph 2' ]

How to load local files in Puppeteer?

Wed, 15 Feb 2023 09:10:27 +0000

You can load local files in Puppeteer by using the same page.goto method that you use for URLs, but you need to provide it with the file URL using the file protocol (file://). The file path must be an absolute path.

Here's some example code that opens a file located at /Users/yasoob/Desktop/ScrapingBee/index.html:

const puppeteer = require('puppeteer');

const filePath = 'file://'+'/Users/yasoob/Desktop/ScrapingBee/index.html';

async function loadLocalFile() {
 const browser = await puppeteer.launch({
 headless: false
 });
 const page = await browser.newPage();
 await page.goto(filePath);
}

loadLocalFile();

It's important to note that loading local files in most browsers is subject to the same-origin policy, which means that the loaded file should come from the same origin as the web page running the JavaScript code. Additionally, it is important to make sure that the path being accessed is accessible by the running script. You can read more about these security implications in this StackOverflow answer.

How to run Puppeteer in Jupyter notebooks?

Wed, 15 Feb 2023 09:10:27 +0000

You can run Puppeteer in Jupyter notebook by using a JavaScript kernel instead of the default Python one. There is the famous IJavaScript kernel but that does not work with Puppeteer. The reason is that Puppeteer is async and needs a kernel that supports that. You can instead use this patched version of the IJavaScript kernel that adds this async support.

Assuming that you already have jupyter installed, you can install the patched IJavaScript kernel using npm:

How to wait for page to load in Puppeteer?

Wed, 15 Feb 2023 09:10:27 +0000

You can wait for the page to load in Puppeteer by using the waitForSelector method. This will pause execution until a specific element shows up on the page and indicates that the page has fully loaded. This feature is extremely helpful while performing web scraping on dynamic websites.

Here is some sample code that opens up ScrapingBee homepage and waits for the content section to show up:

const puppeteer = require('puppeteer');

async function waitForSelector() {
 const browser = await puppeteer.launch({
 headless: false
 });
 const page = await browser.newPage();
 await page.goto("https://scrapingbee.com");

 await page.waitForSelector('#content', { timeout: 5_000 });
 // Do whatever you want with the page next
}

waitForSelector();

You can read more about the waitForSelector API in the official docs.

Who owns Playwright?

Wed, 15 Feb 2023 09:10:27 +0000

Playwright is an open-source web automation framework, which is developed and maintained by Microsoft, as well as a community of contributors from all around the world. The development of Playwright takes place on GitHub and the contributors have to sign a one-time Contributor License Agreement before making contributions to the project. The project has an Apache 2.0 License which allows users to freely use the framework in their private as well as commercial projects.

Why do we need Playwright?

Wed, 15 Feb 2023 09:10:27 +0000

Playwright is a web automation framework that allows developers to automate web applications and browsers, much like Selenium and Puppeteer. It provides a powerful, flexible, and reliable way to automate end-to-end testing, browser automation, and web scraping in Python, .NET, Java, or Node.js.

One of the major features of Playwright is its ability to support multiple web browsers, such as Chromium, Firefox, and Webkit-based Safari, out of the box, which allows developers to test their web apps on different browsers with minimal effort.

How do I read a JSON in Python?

Sat, 11 Feb 2023 09:10:27 +0000

To read a JSON file in Python, you can use the built-in json module. Here is a sample file.json file:

{
 "name": "John Doe",
 "age": 32,
 "address": {
 "street": "123 Main St",
 "city": "Anytown",
 "state": "CA"
 },
}

And here is some sample Python code for reading this file:

import json

with open('file.json', 'r') as json_file:
 data = json.load(json_file)

print(data["name"])
# Output: John Doe

The json.loads method takes in a JSON string and converts it into a Python dictionary. You can read more about the JSON library in the official Python docs.

How does JSON parser work?

Sat, 11 Feb 2023 09:10:27 +0000

A JSON (JavaScript Object Notation) parser is a program that reads a JSON-formatted text file and converts it into a more easily usable data structure, such as a dictionary or a list in Python or an object in JavaScript.

The parser works by tokenizing the input JSON text, breaking it up into individual elements such as keys, values, and punctuation. It then builds a data structure, such as a dictionary, a list, or an object, that corresponds to the structure of the input JSON.

What is a JSON parser?

Sat, 11 Feb 2023 09:10:27 +0000

A JSON parser is a software component or library that reads a JSON (JavaScript Object Notation) formatted text file and converts it into a more usable data structure, such as a dictionary or a list in Python, or an object in JavaScript.

JSON is a text-based, human-readable format for representing structured data. It is commonly used for transmitting data between a server and a web application or for storing data in a file or a database. A JSON parser provides a way to read JSON text and convert it into a more usable data structure, making it easier to access and manipulate the data.

Are HTTP websites safe?

Mon, 06 Feb 2023 09:10:27 +0000

HTTP websites are not as secure as HTTPS websites. In HTTP, the communication between the client and server is not encrypted, so it's possible for someone to intercept and view sensitive information like passwords and credit card numbers. On the other hand, HTTPS encrypts the communication, providing a secure connection and protecting the privacy of users. It's recommended to use HTTPS for websites that handle sensitive information. Moreover, with the availability of free SSL certificates by Let's Encrypt, there is very little reason to still use naked HTTP.

Does WebCrawler still exist?

Mon, 06 Feb 2023 09:10:27 +0000

WebCrawler still exists and is chugging along. According to Wikipedia, the website last changed hands in 2016 and the homepage was redesigned in 2018.

Since then it has been working under the same company: System1.

It is not as popular as it used to be, however, you can still search for information on the platform and get relevant results.

According to SimilarWeb, WebCrawler has only 240,000 monthly visitors, making it not even in the top 100,000 websites in the world.

How do I hide my IP address for free?

Mon, 06 Feb 2023 09:10:27 +0000

There are several ways to hide your IP address for free:

Use a free proxy server: You can use a free proxy server to hide your IP address and browse the web anonymously. However, it is important to keep in mind that free proxies can be risky to use as they may be operated by malicious individuals who could use them to snoop and steal your personal data or compromise your security. You can get a free proxy from the Free Proxy or similar websites.
Use a free VPN (Virtual Private Network): Some VPN services offer a free version that allows you to hide your IP address, encrypt your internet traffic, and browse the web securely. However, free VPN services may have data usage or speed limitations and may not be as secure as paid services. You can use ProtonVPN. It is provided by a reliable company with a good track record and has a free plan.
Use the Tor browser: The Tor browser is a free, open-source browser that routes your internet traffic through a series of servers to hide your IP address and provide anonymity. The Tor browser is highly secure but can be slower than a proxy or VPN as it routes the traffic through multiple successive servers (like the layers of an onion :)). You can download Tor from here.

There is also an option to use the freemium plans of paid proxy services like ScrapingBee. You can only make a limited amount of proxied requests using the freemium plans of such services but if your needs are small then this might suffice.

Is Google a web crawler?

Mon, 06 Feb 2023 09:10:27 +0000

Google is most definitely a web crawler. They operate a web crawler with the name of Googlebot which searches for new websites, crawls them, and saves them in the massive search engine database. This is how Google powers its search engine and keeps it fresh with results from new websites. You can learn more about Googlebot over at Google's documentation website.

So yes, Google is a web crawler, but it is not WebCrawler, as WebCrawler is a company, also crawling the web, but not Google.

Is it better to use IPv6 or IPv4?

Mon, 06 Feb 2023 09:10:27 +0000

It is generally considered better to use IPv6, which is a newer and latest version of the Internet Protocol (IP) after IPv4. There are several reasons for this:

Larger Address Space: IPv6 has a much larger address space than IPv4, which allows for a much larger number of unique IP addresses. This is important as the increasing number of devices connecting to the internet is rapidly depleting the available IPv4 addresses.
Improved Security: IPv6 includes built-in security features, such as IPsec (Internet Protocol Security) encryption, which helps to protect against attacks and improve the overall security of the internet.
Better Support for Mobile Devices: IPv6 has better support for mobile devices and enables easier network transitions for mobile users, allowing for smoother and more efficient mobile connectivity. This is possible because IPv6 gets rid of the NAT and allows for a few different optimizations.
More Efficient Routing: IPv6 uses simpler and more efficient routing algorithms, which helps to reduce network congestion and improve network performance.

That being said, IPv4 is still widely used and many networks continue to use both IPv4 and IPv6, with IPv4 being used as a fallback for devices that do not support IPv6. The transition to IPv6 is ongoing and is expected to take several more years to complete.

Is it legal to use proxies?

Mon, 06 Feb 2023 09:10:27 +0000

Using a proxy server in and of itself is not illegal. However, the legality of using a proxy depends on how it is being used and in which jurisdiction.

In some countries, using a proxy to bypass internet censorship or access restricted websites may be illegal. In other countries, the use of a proxy to protect privacy is allowed and protected by law. Some of the countries which completely or partially block proxies and VPNs include:

Is SOCKS5 the same as VPN?

Mon, 06 Feb 2023 09:10:27 +0000

No, SOCKS5 and VPN are not the same things.

SOCKS5 is a proxy protocol that provides routing for network traffic, allowing clients to bypass network restrictions and access the internet securely and anonymously. SOCKS5 does not provide encryption for the data being sent through the proxy, meaning that your internet traffic can be intercepted and monitored by third parties. However, due to no encryption, it might be slightly faster than a VPN.

Should I use IPv6 at home?

Mon, 06 Feb 2023 09:10:27 +0000

Yes, you can use IPv6 at home. In fact, it is recommended to use IPv6 as it is the future of the internet and provides many benefits over IPv4, such as a larger address space, improved security, and better network auto-configuration capabilities. However, whether or not you can or should use IPv6 at home depends on a few factors:

Availability: IPv6 is not yet widely available and many home internet service providers do not support it. Before you can use IPv6 at home, you need to ensure that your internet service provider supports it and that your home network is configured to use it.
Devices: Many devices, such as smartphones, laptops, and smart home devices, already support IPv6, but others, such as older devices, may not. You should check to see if all the devices on your home network support IPv6 and if not, whether they can be upgraded to support it.
Performance: IPv6 can provide faster and more reliable connections, but this will depend on your internet service provider and the quality of your network connection and your ISP's support for IPv6. Simply upgrading to IPv6 will not magically solve all of the performance issues if there is a separate underlying cause.

So as you see, IPv6 is the preferred protocol to use but you may have some dependencies that will prevent a complete adoption of this newer IP protocol.

What are examples of proxies?

Mon, 06 Feb 2023 09:10:27 +0000

A proxy is a server that acts as an intermediary between a client and a server, forwarding requests from clients to servers and vice versa. Here are some examples of different types of proxies used in web scraping:

Data Center Proxies: These are proxy servers that are owned and operated by data centers. They are used to hide the user's real IP address and provide a different IP address from the data center's pool of IP addresses. They can be sourced from regional data centers or from AWS, Google, and other similar cloud providers. Data center proxies are typically faster than residential proxies but are easily detectable by websites and services that block proxy usage.
Residential Proxies: These are proxy servers that use residential IP addresses provided by internet service providers (ISPs). They are considered better than data center proxies because they provide a real IP address from a physical location and are less likely to be detected as a proxy. However, they tend to be slower and more expensive than data center proxies.
4G Proxies: These are proxy servers that use mobile 4G network IP addresses. They are similar to residential proxies, providing a real IP address from a physical location, but they also offer the added benefit of mobility. However, the speed and reliability of 4G proxies can vary depending on the proxy location and network conditions.

If you ever have to use proxies, make sure you get them from a reliable provider like ScrapingBee as some providers in the market source these proxies using illegal and shady tactics.

What are the 3 types of HTTP cookies?

Mon, 06 Feb 2023 09:10:27 +0000

The three types of HTTP cookies are:

Session Cookies: These are temporary cookies that are stored in the browser's memory only while a user is on a website. Once the user closes the browser, the session cookie is deleted.
Persistent Cookies: These are also known as first-party cookies. They have an expiration date and are stored on the user's device for a specified period of time, even after the user has closed the browser.
Third-party Cookies: These are also referred to as tracking cookies. They are set by a domain other than the one the user is visiting. For example, a user visiting a website might see ads served by an ad network that uses third-party cookies to track the user's behavior and show relevant ads.

What is a proxy vs VPN?

Mon, 06 Feb 2023 09:10:27 +0000

A proxy and a VPN (Virtual Private Network) both provide a means to hide your IP address and protect your privacy online, but they differ in several key ways:

Purpose: A proxy and a VPN are both designed to act as intermediaries between a client and a server and forward network requests between them. However, a VPN works on the operating system level and usually routes all of the network traffic, whereas, a proxy works at the application level and routes only a specific application's traffic.
Security: A proxy typically provides minimal security and encryption. The traffic going through a proxy is usually not encrypted. Whereas a VPN provides a high level of security and encryption and protects your internet traffic from prying eyes. This means that even though your scummy ISP might be able to surveil your proxy traffic, it won't be able to pry on your VPN traffic due to its encrypted nature.
Performance: A proxy may be faster than a VPN because it does not need to encrypt and decrypt data. However, with the improvements in the speed and performance of systems and networks, this difference is slowly vanishing.
Cost: Proxies can be free or low-cost, while VPNs can be a bit more expensive. This makes proxies a better option for tasks like web scraping where you might want to source thousands or millions of different IPs for making automated requests.

As you can see, both a proxy and a VPN can be used to hide the IP address. And while a VPN provides a more secure and private connection, it may be slower and more expensive than a proxy. The best option for you will depend on your specific needs and the level of security required.

What is a web crawler used for?

Mon, 06 Feb 2023 09:10:27 +0000

A web crawler is a "bot" generally used by search engines to look for new websites, download their data, and index it. They power most of the popular search engines like Google, Yahoo!, and Bing. These bots are called crawlers as this is the technical term to define what they do which is automatically opening a website and obtaining its data.

You can learn more about web crawlers from Wikipedia.

Which is better Scrapy or BeautifulSoup?

Mon, 06 Feb 2023 09:10:27 +0000

It is hard to say whether Scrapy is better or BeautifulSoup as both of them are complementary to each other and do different things.

Scrapy is a robust, feature-complete, extensible, and maintained web scraping framework. It contains advanced features like rate-limiting, proxy rotation, automated URL discovery, pause/resume crawling functionality, remote control, and multiple output formats.

BeautifulSoup on the other hand is simply an HTML parsing library. You can couple BeautifulSoup with Scrapy to parse HTML responses using BeautifulSoup in Scrapy callbacks. You can follow this guide to learn more about this.

Which is faster IPv4 or IPv6?

Mon, 06 Feb 2023 09:10:27 +0000

In theory, IPv6 is faster than IPv4 as it uses an efficient routing algorithm and gets rid of the necessity of NAT (Network Address Translation). At the same time, it also eliminates the need for IP-level fragmentation, which is required in IPv4 networks, and has a simpler header format that reduces the processing overhead required to handle network packets. However, in practice, these speed improvements may not always be realized due to certain reasons.

Why is HTTPS not used for all web traffic?

Mon, 06 Feb 2023 09:10:27 +0000

There are several reasons why HTTPS is not used for all web traffic:

Cost: Implementing HTTPS requires an SSL or TLS certificate, which can be expensive for some organizations. Smaller websites may not have the budget to purchase and maintain a certificate. However, this is less of a concern now as Let's Encrypt and similar websites offer free SSL certificates.
Lack of Awareness: Some website owners and developers may not fully understand the importance of using HTTPS, or may not realize that their website is not currently using HTTPS. However, with Google and other search engines penalizing HTTP-only websites in their search results, the awareness would eventually improve with time.
Legacy Systems: Some older websites and systems may not be able to support HTTPS due to technical limitations. On top of it, implementing HTTPS can be technically complex, especially for older websites that were not originally designed with security in mind. This can make the transition to HTTPS difficult and time-consuming.

In recent years, there has been a push to increase the use of HTTPS across the web, and many browsers now display security warnings for websites that are not using HTTPS. For example, this is how an HTTP-only website shows on Google Chrome:

Are Python requests deprecated?

Fri, 03 Feb 2023 09:10:27 +0000

Requests is an HTTP library for Python-based programs. It is under active development and not deprecated at all. While writing this answer, the latest release of Requests was in January 2023. Around 1.8 million+ repositories depend on this project so the chances of Requests being deprecated are very slim. Its maintenance and further development falls under the umbrella of the Python Software Foundation. There are alternatives like the httpx project but their existence does not mean that the original Requests project is dead or deprecated.

Is requests a built-in Python library?

Fri, 03 Feb 2023 09:10:27 +0000

Requests is not a built-in Python library. It is available on PyPI and can be installed using the typical PIP command like this:

$ python -m pip install requests

It officially supports Python 3.7+ so you need to make sure your project is either using Python 3.7 or above in order to use Requests.

You can learn more about this library on the official website or GitHub page.

What is Puppeteer?

Fri, 03 Feb 2023 09:10:27 +0000

Puppeteer is a browser automation library developed by the Chrome Dev Tools team.

Simply put, it is a tool that allows you to control your web browser with NodeJS scripts.

In more technical terms it supports automating Chrome/Chromium over the non-standard DevTools Protocol.

There is experimental Firefox support as well.

You can do almost anything with Puppeteer that you normally do manually. According to the official website, this list of possible actions includes:

What is requests used for in Python?

Fri, 03 Feb 2023 09:10:27 +0000

Requests is an HTTP library for Python-based programs. It is one of the most downloaded Python packages. It provides a nice API for making HTTP requests.

Requests is popular because it is very simple to use compared to HTTP libraries like urllib and urllib2.

You can use it to make GET, POST, PUT, DELETE, HEAD, OPTIONS, PATCH requests. It also supports HTTP Basic/Digest Authentication, Cookies, Redirects, and more.

While being the most popular by far, Requests is lacking some modern feature that other HTTP libraries like https have like Async and HTTP/2 support.

Which is better Playwright or Puppeteer?

Fri, 03 Feb 2023 09:10:27 +0000

Playwright and Puppeteer are both browser automation tools and libraries. They are both mature and contain all the necessary features for browser automation. There is no clear answer as to which library you should use. However, there are a few significant differences between both that might help you decide which one might suit you better.

Puppeteer

Developed by Chrome Dev Team in 2017
Puppeteer officially only supports Javascript. There is an unofficial port in Python but that's it.
Fully supports Chromium along with experimental Firefox support

Playwright

Developed by Microsoft and released in 2020
Supports Golang, Python, Java, JavaScript, and C#
Supports Chromium, Firefox, and WebKit

Playwright is more recent so there is a smaller community as compared to Puppeteer. Look at this NPM popularity graph to decide which one is more popular:

Who owns Puppeteer?

Fri, 03 Feb 2023 09:10:27 +0000

Puppeteer is owned by Google and is developed as an open-source project with contributions from developers from all over the world. Puppeteer was initially developed and released by the Chrome DevTools team in 2017 and the current development takes place on GitHub. Most of the individual contributors are not affiliated with Google. However, the project still falls under Google's umbrella and the contributors have to sign a one-time Contributor License Agreement before they can contribute.

Why do we need Puppeteer?

Fri, 03 Feb 2023 09:10:27 +0000

Puppeteer is a Node.js library used for automating web page interactions. It provides a high-level API to control Chrome or Chromium-based browsers, enabling developers to automate browser tasks, generate screenshots and PDFs, crawl web pages, and perform end-to-end testing. This library becomes extremely useful when doing web scraping as it allows you to execute website JavaScript and even hide the fact that you are using a browser automation library via puppeteer-extra-plugin-stealth and similar plugins.

What are the 6 characteristics of a REST API?

Mon, 30 Jan 2023 00:00:00 +0000

REST API architecture was put forward by Dr. Roy Fielding in his 2000 doctoral dissertation. This architecture has been around for roughly 23 years and most of the popular APIs follow it. Despite this widespread use, most programmers are still unaware of the six underlying characteristics that define a REST or RESTful API.

You can ensure you are not among that list of programmers by going through Dr. Fielding's 180-page dissertation. Or better yet, spend the next few minutes reading this very article where we go over this list of characteristics and discuss why each one is important. We can assure you the latter option is more fun and time friendly 😄

403 status code - what is it and how to avoid it?

Tue, 24 Jan 2023 09:10:27 +0000

A 403 status code refers to the Forbidden response status. It is thrown by the server when it recognizes the request as being valid but is not willing to fulfil it. It might be caused by a lack of proper headers in your request so make sure you are passing all the required CORS/JWT/Authentication headers that the server is expecting.

However, if the website is normally accessible and sending proper headers is still not making it work, your requests might be getting recognized by the server as being automated. In such a scenario, make sure you are using undetected-chromedriver or a similar tool and pair it up with proxies from a reliable proxy provider like ScrapingBee. Or better yet, use ScrapingBee's web scraping API and let us handle the task of not getting blocked. This should help solve the issue.

429 status code - what is it and how to avoid it?

Tue, 24 Jan 2023 09:10:27 +0000

A 429 status code refers to the Too Many Requests error. It might be thrown by the server if the user has made excessive requests in a short amount of time and the server is using rate-limiting. The best way to avoid this error is to do either of these two things:

Throttle your requests. Make sure you are making only a few requests in a given timeframe so as not to hit the rate-limit
Distribute your requests across proxies so that they all go from different IPs and don't trigger the rate-limit

For the second option, you can use ScrapingBee's reliable proxies to make sure they aren't part of any blocklist. Or better yet, use ScrapingBee's web scraping API and let us handle the task of not getting blocked.

444 status code - what is it and how to avoid it?

Tue, 24 Jan 2023 09:10:27 +0000

A 444 status code is thrown when a website unexpectedly closes the connection without sending any response to the client. It is an unofficial code and specific to NGINX. There are multiple reasons why NGINX might throw this error. It might occur when the server has identified your requests to be automated. The best way to avoid it is to make every effort to conceal your automated requests and make them resemble a regular user's browsing pattern. You can use undetected-chromedriver and pair it up with proxies from a reliable proxy provider like ScrapingBee. Or better yet, use ScrapingBee's web scraping API and let us handle the task of not getting blocked.

499 status code - what is it and how to avoid it?

Tue, 24 Jan 2023 09:10:27 +0000

A 499 status code refers to "client closed request" error. This is a client-side code where the client did not wait long enough for the server to respond. It generally occurs in reverse proxy setups where NGINX is acting as a reverse proxy for a UWSGI or similar upstream server and did not wait long enough for the server to return the response.

If the website is working fine under normal settings then the chances are that your requests might be getting identified as being automated. In such a scenario, make sure you are using undetected-chromedriver or a similar tool and pairing it up with proxies from a reliable proxy provider like ScrapingBee. Or better yet, use ScrapingBee's web scraping API and let us handle the task of not getting blocked. This should help solve the issue.

503 status code - what is it and how to avoid it?

Tue, 24 Jan 2023 09:10:27 +0000

A 503 status code refers to the Service Unavailable error. This might be thrown by a web server when it is not ready to serve any requests at the moment. This status code also means that there aren't any issues with the server but it is just not ready to serve your request. It might be caused by resource exhaustion or the server being down for maintainance.

You can solve this error by figuring out if the server is actually down for maintenance or whether it is just not responding specifically to your requests. If it is the former, then waiting for a while before trying again might solve the issue. However, it it is the latter, make sure you are using undetected-chromedriver or a similar tool and pairing it up with proxies from a reliable proxy provider like ScrapingBee. Or better yet, use ScrapingBee's web scraping API and let us handle the task of getting around the 503 error. This should help solve the issue.

520 status code - what is it and how to avoid it?

Tue, 24 Jan 2023 09:10:27 +0000

A 520 status code is related to Cloudflare. It is used by Cloudflare as a catch-all response for when the origin server sends something unexpected. It might be caused by some technical issues on the website. However, it can also be caused if your requests do not contain the required data that the website is expecting. So make sure that you are including all the required headers (CORS, Referrer, Auth) in your requests.

Cloudflare Error 1009: what is it and how to avoid it?

Tue, 24 Jan 2023 09:10:27 +0000

Cloudflare Error 1009 refers to the Access Denied: Country or region banned error. It is thrown by Cloudflare when the website owner has banned the country or region where your IP address is originating from.

The only way to get around these errors is to use a reliable premium proxy provider like ScrapingBee that lets you manually select the proxy region as well. This way you can continue web scraping from a country or region that is not banned by the website. This should help you bypass the 1009 error.

Cloudflare Error 1010: what is it and how to avoid it?

Tue, 24 Jan 2023 09:10:27 +0000

Cloudflare Error 1010 means that the owner of the website has banned your access based on your browser's signature. This can happen when you are trying to scrape a website using automated tools like Selenium, Puppeteer, or Playwright. These tools are very easy to fingerprint using Javascript.

You can get around this error in two ways. One is to use tools like undetected-chromedriver which can not easily be fingerprinted. And another is to use web scraping APIs by companies like ScrapingBee. We use anti-fingerprinting browsers for web scraping. This makes sure our scrapers are not easily fingerprinted and banned by websites.

Cloudflare Error 1015: what is it and how to avoid it?

Tue, 24 Jan 2023 09:10:27 +0000

Cloudflare Error 1015 refers to the rate limiting error. It is thrown by Cloudflare when the website owner has implemented a rate limit for requests and you are violating that rate limit. This can happen when you are sending a ton of requests in a very short amount of time.

You can get around this error in two ways. One is to throttle your requests. Make sure you are only sending a limited number of requests in a given time. Another way to get around this error is to use a reliable premium proxy provider like ScrapingBee. ScrapingBee makes sure to rotate the proxies so no one proxy triggers the rate limiting. This should help you bypass the Cloudflare 1015 error.

Cloudflare Error 1020: what is it and how to avoid it?

Tue, 24 Jan 2023 09:10:27 +0000

Cloudflare Error 1020 refers to the Access Denied error. It is thrown by Cloudflare when you violate a firewall rule set up by the Cloudflare-protected website. This violation can occur due to various reasons including sending too many requests to the website.

If the website is working fine without using automated tools then you need to improve your web scraping techniques. You can hide your automated requests by making use of undetected-chromedriver or a similar tool and pairing it up with premium proxies from a reliable proxy provider like ScrapingBee. Or better yet, use ScrapingBee's APIs and let us handle the task of not getting blocked. This should help you avoid the 1020 error.

Cloudflare Errors 1006, 1007, 1008: how to avoid them?

Tue, 24 Jan 2023 09:10:27 +0000

Cloudflare Errors 1006, 1007, and 1008 refer to Access Denied errors. They vary only slightly from each other. They are thrown by Cloudflare when your IP address has been banned. This generally occurs when a Cloudflare customer (the website you are trying to scrape) bans traffic originating from your IP address. They might do this when they have identified that you are trying to scrape their website.

How to scrape Perimeter X: Please verify you are human?

Tue, 24 Jan 2023 09:10:27 +0000

While web scraping, you might come across PerimeterX. It is a service that helps protect websites from automated scraping. You can recognize PerimeterX by the "Press & Hold" and "Please verify you are a human" messages similar to the image below:

PerimeterX and similar anti-scraping tools rely on JavaScript fingerprinting and similar techniques which are hard to get around by using regular scraping frameworks.

The best way to work around PerimeterX is to make sure the server does not recognize automated requests. You can hide your automated requests by making use of undetected-chromedriver or a similar tool and pairing it up with premium proxies from a reliable proxy provider like ScrapingBee. Or better yet, use ScrapingBee's web scraping API and let us handle the task of not getting blocked.

How to download a file using cURL?

Thu, 19 Jan 2023 09:10:27 +0000

To download a file using cURL you simply need to make a GET request (default behavior) and to specify the -o (output) command line option so that the response is written to a file. Here is a sample command that downloads a file from our hosted version of HTTPBin:

curl https://httpbin.scrapingbee.com/images/png \
 -o image.png

Here we ask for cURL to fetch a png image and write the result inside a file named image.png.

How to follow redirect using cURL?

Thu, 19 Jan 2023 09:10:27 +0000

To follow redirect using cURL you need to use the -L option. Here is a sample command that sends a GET request to our hosted version of HTTPBin and follows the redirect:

curl -L https://httpbin.scrapingbee.com/redirect-to?url=https://httpbin.scrapingbee.com/headers?json

Here we ask for cURL to follow redirection, and the url we hit, redirect us to the headers endpoint. The response will be:

{
 "headers": {
 "Host": "httpbin.scrapingbee.com",
 "User-Agent": "curl/7.86.0",
 "Accept": "*/*"
 }
}

Now, if we remove the -L option, cURL no longer follow redirection the response will be:

How to get file type of an URL in Python?

Thu, 19 Jan 2023 09:10:27 +0000

You can get the file type of a URL in Python via two different methods.

Use the mimetypes module

mimetypes module comes by default with Python and can infer the file type from the URL. This relies on the file extension being present in the URL. Here is some sample code:

import mimetypes

mimetypes.guess_type("http://example.com/file.pdf")
# Output: ('application/pdf', None)

mimetypes.guess_type("http://example.com/file")
# Output: (None, None)

Perform a HEAD request to the URL and investigate the response headers

A head request does not download the whole response but rather makes a short request to a URL to get some metadata. An important piece of information that it provides is the Content-Type of the response. This can give you a very good idea of the file type of a URL. Here is some sample code for making a HEAD request and figuring out the file type:

How to get JSON with cURL ?

Thu, 19 Jan 2023 09:10:27 +0000

You can get JSON with cURL by sending a GET request with the -H "Accept: application/json" option. Here is a sample command that sends a GET request to our hosted version of HTTPBin and returns the response in JSON format:

curl https://httpbin.scrapingbee.com/anything?json \
 -H "Accept: application/json"

It is quite simple because GET is the default request method used by cURL.

Also, in many cases, you won't have to specify the Accept header because the server will return JSON by default.

How to get XML with cURL ?

Thu, 19 Jan 2023 09:10:27 +0000

You can get XML with cURL by sending a GET request with the -H "Accept: application/xml" option. Here is a sample command that sends a GET request to our hosted version of HTTPBin and returns the response in JSON format:

curl https://httpbin.scrapingbee.com/xml \
 -H "Accept: application/xml"

It is quite simple because GET is the default request method used by cURL.

What is cURL?

cURL is an open-source command-line tool used to transfer data to and from a server. It is extremely versatile and supports various protocols including HTTP, FTP, SMTP, and many others. It is generally used to test and interact with APIs, download files, and perform various other tasks involving network communication.

How to ignore invalid and self-signed certificates using cURL?

Thu, 19 Jan 2023 09:10:27 +0000

To ignore invalid and self-signed certificates using cURL you need to use the -k option. Here is a sample command that sends a GET request to our hosted version of HTTPBin with the -k option:

curl -k https://httpbin.scrapingbee.com

Be careful, ignoring invalid and self-signed certificates is a security risk and should only be used for testing purposes. In production, you should always use valid certificates as accepting invalid ones mean that you will be vulnerable to man-in-the-middle attacks.

How to ignore non-HTML URLs when web crawling?

Thu, 19 Jan 2023 09:10:27 +0000

You can ignore non-HTML URLs when web crawling via two methods.

Check the URL suffix for unwanted file extensions

Here is some sample code that filters out image file URLs based on extension:

import os

IMAGE_EXTENSIONS = [
 'mng', 'pct', 'bmp', 'gif', 'jpg', 
 'jpeg', 'png', 'pst', 'psp', 'tif', 
 'tiff', 'ai', 'drw', 'dxf', 'eps', 
 'ps', 'svg', 'cdr', 'ico',
]

url = "https://scrapingbee.com/logo.png"
if os.path.splitext(url)[-1][1:] in IMAGE_EXTENSIONS:
 print("Abort the request")
else:
 print("Continue the request")

Perform a HEAD request to the URL and investigate the response headers

A head request does not download the whole response but rather makes a short request to a URL to get some metadata. An important piece of information that it provides is the Content-Type of the response. This can give you a very good idea of the file type of a URL. If the HEAD request returns a non-HTML Content-Type then you can skip the complete request. Here is some sample code for making a HEAD request and figuring out the response type:

How to parse dynamic CSS classes when web scraping?

Thu, 19 Jan 2023 09:10:27 +0000

You can parse dynamic CSS classes using text-based XPath matching. Here is a short example of what HTML with dynamic CSS classes might look like:

<div class="dd">
 <h1 class="aa">Product Details</h1>
 <div class="ffa">
 <div class="la">Remaining Stock</div>
 <div class="ad">5</div>
 </div>
</div>

If you want to extract the value of the remaining stock you can target the HTML div tag that contains "Remaining Stock" and then select the sibling div that contains the stock count. You can do so using text-based XPath matching like this:

How to POST JSON using cURL?

Thu, 19 Jan 2023 09:10:27 +0000

You can send JSON with a POST request using cURL using the -X option with POST and the -d option (data).

Here is a sample command that sends a POST request to our hosted version of HTTPBin with JSON data:

curl -X POST https://httpbin.scrapingbee.com/post \
 -d '{"name":"John Doe","age":30,"city":"New York"}'

Note that your JSON data must be enclosed in single quotes.

What is cURL?

How to select elements by class in XPath?

Thu, 19 Jan 2023 09:10:27 +0000

You can select elements by class in XPath by using the contains(@class, "class-name") or @class="class-name" expressions.

The first expression will match any element that contains class-name. Even if the element has additional classes defined it will still match. However, the second expression will match the elements that only have one class named class-name and no additional classes.

Here is some Selenium XPath sample code that extracts the h1 tag from the ScrapingBee website using the class name:

How to select elements by text in XPath?

Thu, 19 Jan 2023 09:10:27 +0000

Do you need to grab elements by text using XPath? Well, today we're going to discuss just that. Our tutorial keeps things simple: exact matches with text() = '...', partial matches with contains(), plus starts-with() and normalize-space() to avoid whitespace-related issues. You'll learn about case sensitivity, special characters, and how text matching differs for attributes vs. inner text. Of course, this article also includes copy-pasteable examples for Python/lxml and Selenium.

How to send a DELETE request using cURL?

Thu, 19 Jan 2023 09:10:27 +0000

You can send a DELETE request using cURL via the following command:

curl -X DELETE <url>

Where:

-X flag is used to define the request method that cURL should use. By default cURL sends a GET request.

Replace <url> with the URL of the resource you want to delete. Here is a sample command that sends a DELETE request to our hosted version of HTTPBin:

$ curl -X DELETE "https://httpbin.scrapingbee.com/delete"

What is cURL?

How to send a GET request using cURL?

Thu, 19 Jan 2023 09:10:27 +0000

You can send a GET request using cURL via the following command:

curl <url>

It is quite simple because GET is the default request method used by cURL.

Replace <url> with the URL of the resource you want to delete. Here is a sample command that sends a GET request to our hosted version of HTTPBin:

$ curl "https://httpbin.scrapingbee.com/anything?json"

What is cURL?

How to send Basic Auth credentials using cURL?

Thu, 19 Jan 2023 09:10:27 +0000

To send Basic Auth credentials using cURL you need to use the -u option with "login:password" where "login" and "password" are your credentials.

Here is a sample command that sends a GET request to our hosted version of HTTPBin with Basic Auth credentials:

curl https://httpbin.scrapingbee.com/basic-auth/login/password \
 -u "login:password"

When using this method, the credentials are sent in plain text, if used over HTTP, so it is not recommended to use it in production.

How to send HTTP header using cURL?

Thu, 19 Jan 2023 09:10:27 +0000

To send HTTP header using cURL you just have to use the -H command line option with the header name and value. Here is a sample command that sends a GET request to our hosted version of HTTPBin with a custom HTTP header:

curl https://httpbin.scrapingbee.com/headers?json \
 -H "custom-header: custom-value"

And since this particular URL returns the headers sent to the server in JSON format, the response will be:

{
 "headers": {
 "Custom-Header": "custom-value",
 "Host": "httpbin.scrapingbee.com",
 "User-Agent": "curl/7.86.0"
 }
}

You can also pass several headers by using the -H option multiple times:

How to turn HTML to text in Python?

Thu, 19 Jan 2023 09:10:27 +0000

You can easily extract text from an HTML page using any of the famous HTML parsing libraries in Python. Here is an example of extracting text using BeautifulSoup's get_text() method:

from bs4 import BeautifulSoup

soup = BeautifulSoup("""
<body>
 <h1 class="product">Product Details</h1>
 <div class="details">
 <div>Remaining Stock</div>
 <div>5</div>
 </div>
</body>
""")

body = soup.find('body')
body_text = body.get_text()
print(body_text)

It will produce the following output:


Product Details

Remaining Stock
5

Selenium also offers something similar. You can use the .text property of an HTMLElement to extract text from it.

How to use XPath selectors in Python?

Thu, 19 Jan 2023 09:10:27 +0000

There are multiple ways for using XPath selectors in Python. One popular option is to use lxml and BeautifulSoup and pair it with requests. And the second option is to use Selenium.

Here is some sample code for using lxml, BeautifulSoup, and Requests for opening up the ScrapingBee homepage and extracting the text from h1 tag using XPath:

import requests
from lxml import etree
from bs4 import BeautifulSoup

html = requests.get("https://scrapingbee.com")
soup = BeautifulSoup(html.text, "html.parser")
dom = etree.HTML(str(soup))

first_h1_text = dom.xpath('//h1')[0].text
print(first_h1_text)
# Output: Tired of getting blocked while scraping the web?

Here is some sample code for doing the same with Selenium:

Scraper doesn't see the data I see in the browser - why?

Thu, 19 Jan 2023 09:10:27 +0000

This issue can often show up when you are using an HTML parser like BeautifulSoup or lxml instead of a browser engine via Selenium or Puppeteer. The data you are seeing in the browser might be getting generated via client-side JavaScript after the page load. BeautifulSoup, lxml, and similar HTML parsing libraries do not execute JavaScript.

There are two options to solve this issue:

Use a browser automation framework like Selenium or Puppeteer and execute the JavaScript before attempting data extraction
Search for required data in the <script> tags. Most of the time, the required data is hidden inside <script> tags as JavaScript variables and then rendered on the page after the page load

How to find HTML elements by class?

Wed, 18 Jan 2023 09:10:27 +0000

You can find HTML elements by class via multiple ways in Python. The method you choose will depend on the library you are using. Some of the most famous libraries that allow selecting HTML elements by class are BeautifulSoup and Selenium.

You can use the find or find_all methods of BeautifulSoup and pass in a class_ argument to match elements with a particular class. This is how it will look like:

How to fix ConnectTimeout error in Python requests?

Wed, 18 Jan 2023 09:10:27 +0000

ConnectTimeout occurs when the website you are trying to connect to doesn't respond to your connect request in time. You can simulate this error for a website by using a custom connect timeout in your request.get() call:

import requests

# Timeout is in seconds
connect_timeout = 0.1
read_timeout = 10
response = requests.get("https://scrapingbee.com/", timeout=(connect_timeout, read_timeout))

If you are sure your IP is not being blocked by the website and the website is working fine, then you can fix this error by increasing the connect timeout value:

How to fix MissingSchema error in Python requests?

Wed, 18 Jan 2023 09:10:27 +0000

MissingSchema occurs when you don't provide the complete URL to requests. This often means you skipped http:// or https:// and/or provided a relative URL.

You can fix this error by making use of the urljoin function from the urllib.parse library to join URLs before making a remote request. The solution will look something like this:

from urllib.parse import urljoin
import requests

url = "https://scrapingbee.com"
relative_url = "/path/to/resource"

final_url = urljoin(url, relative_url)
html = requests.get(final_url)

urljoin will merge two URLs only if the second argument is a relative path. For example, the following sample code will print https://scrapingbee.com:

How to fix ReadTimeout error in Python requests?

Wed, 18 Jan 2023 09:10:27 +0000

ReadTimeout occurs when the website you are trying to connect to doesn't send back data in time. You can simulate this error for a website by using a custom read timeout in your request.get() call:

import requests

# Timeout is in seconds
connect_timeout = 5
read_timeout = 0.1
response = requests.get("https://scrapingbee.com/", timeout=(connect_timeout, read_timeout))

If you are sure your IP is not being blocked by the website and the website just needs more time before returning data, then you can fix this error by increasing the read timeout:

How to fix SSLError in Python requests?

Wed, 18 Jan 2023 09:10:27 +0000

SSLError occurs when you request a remote URL that does not provide a trusted SSL certificate. The easiest way to fix this issue is to disable SSL verification for that particular web address by passing in verify=False as an argument to the method calls. Just make sure you are not sending any sensitive data in your request.

Here is some sample code that disables SSL verification:

import requests

response = requests.get("https://example.com/", verify=False)

You can optionally provide a custom certificate for the website to fix this error as well. Here is some sample code for providing a custom .pem certificate file to requests:

How to fix TooManyRedirects error in Python requests?

Wed, 18 Jan 2023 09:10:27 +0000

TooManyRedirects error occurs when the request redirects continuously. By default, requests has a limit of 30 redirects. If it encounters more than 30 redirects in a row then it throws this error.

Firstly, you should make sure that the website is not buggy. There aren't a lot of scenarios where more than 30 redirects make sense. Maybe the website is detecting your requests as automated and intentionally sending you in a redirection loop.

How to select HTML elements by text using CSS Selectors?

Wed, 18 Jan 2023 09:10:27 +0000

There used to be a way to select HTML elements by text using CSS Selectors by making use of :contains(text). However, this has been deprecated for a long time and is no longer supported by the W3C standard. If you want to select an element by text, you should look into other options. Most Python libraries provide a way for you to do so.

For instance, you can select an element by text using XPath Selectors in Selenium like this:

Using Parsel to Extract Text from HTML in Python

Tue, 11 Oct 2022 08:10:27 +0200

Web scraping describes the ability to extract or “scrape” data from the internet using an automated program. These programs conduct web queries and retrieve HTML data, which is then parsed to obtain the required information.

Whether you need to collect large amounts of data, data from multiple sources, or data not available through APIs, automating the extraction of this information can save you a lot of time and effort.

In this tutorial, you’ll learn how to use the Parsel Python library to create your own web scraping scripts. Specifically, you’ll learn how to parse HTML documents using Selectors and how to extract data from HTML markup using CSS and XPath. You’ll also learn about removing the elements using the selector object. By the end of the article, you’ll be able to create your own scraping scripts and complex expressions to retrieve data from a web page using the Parsel library.

What is Web Scraping

Thu, 25 Aug 2022 09:24:27 +0000

What is Web Scraping?

Web scraping has many names: web crawling, data extraction, web harvesting, and a few more.

While there are subtle nuances between these terms, the overall idea is the same: to gather data from a website, transform that data to a custom format, and persist it for later use

Search engines are a great example for, both, web crawling and web scraping. They are continuously scouting the web, with the aim to create a "library" of sites and their content, so that when a user then searches for a particular search query they can easily and quickly provide a list of all sites on that particular topic. Just imagine a web without search engines 😨.

What is the best framework for web scraping with Python?

Thu, 07 Jul 2022 09:10:00 +0000

Scrapy

Scrapy framework is a robust and complete web scraping tool that allows you to:

explore a whole website from a single URL (crawling)
rate-limit the exploration to avoid getting banned
generates data export in CSV, JSON, and XML
storing the data in S3, databases, etc
cookies and session handling
HTTP features like compression, authentication, caching
user-agent spoofing
robots.txt
crawl depth restriction
and more

However, this framework can be a bit hard to use, especially for beginners. If you want to learn this framework, check out our Scrapy tutorial.

Which is better for web scraping Python or JavaScript?

Thu, 07 Jul 2022 09:10:00 +0000

Short answer: Python!

Long answer: it depends.

If you're scraping simple websites with a simple HTTP request. Python is your best bet.

Libraries such as requests or HTTPX makes it very easy to scrape websites that don't require JavaScript to work correctly. Python offers a lot of simple-to-use HTTP clients.

And once you get the response, it's also very easy to parse the HTML with BeautifulSoup for example.

Here is a very quick example of how simple it is to scrape a website and extract its title:

Which is better Scrapy or BeautifulSoup?

Mon, 04 Jul 2022 09:10:00 +0000

Scrapy

Scrapy is a more robust, feature-complete, more extensible, and more maintained web scraping tool.

Scrapy allows you to crawl, extract, and store a full website. BeautilfulSoup on the other end only allows you to parse HTML and extract the information you're looking for.

However, Scrapy is much harder to use, this is why we suggest you check out this tutorial showing you how to start with Scrapy if you want to use it.

Introduction to Chrome Headless with Java

Tue, 21 Jun 2022 09:45:11 +0100

In previous articles, we talked about two different approaches to perform basic web scraping with Java. HtmlUnit for scraping basic sites and PhantomJS for scraping dynamic sites which make heavy use of JavaScript.

Both are tremendous tools and there's a reason why PhantomJS happened to be the leader in that market for a long time. Nonetheless, there occasionally were issues, either with performance or with support of web standards. Then, in 2017, there was a real game changer in this field, when both, Google and Mozilla, started to natively support a feature called headless mode in their respective browsers.

How to put scraped website data into Google Sheets

Wed, 09 Mar 2022 08:10:27 +0000

The process of scraping at scale can be challenging. You have to handle javascript rendering, chrome headless, captchas, and proxy configuration. Our scraping tool offers all the above in one API.

Paired with Make (formerly known as Integromat), we will build a no-code workflow to perform any number of actions with the scraped data. Make allows you to design, build, and automate anything—from tasks and workflows to apps and systems—without coding.

Pyppeteer: the Puppeteer for Python Developers

Thu, 24 Feb 2022 09:10:00 +0000

The web acts like a giant, powerful database, with tons of data being generated every single day. With the rise of trends such as big data and data science, data has become more useful than ever, being used to train machine learning algorithms, generate insights, forecast the future, and many other purposes. Extracting this data manually, page by page, can be a very slow and time consuming process. The process of web scraping can be a helpful solution, programmatically extracting data from the web. Thanks to browser automation, which emulates human actions such as clicking and scrolling through a web system, users can simply and efficiently gather useful data without being hindered by a manual process.

C# HTML parsers

Wed, 09 Feb 2022 09:02:00 +0000

Web scraping is essential when trying to retrieve massive amounts of data from the internet, and the most crucial part of the process is HTML parsing, or extracting needed data from HTML code.

If you need an HTML parser, you may be overwhelmed by the many choices. These are the basic criteria to keep in mind:

It must be open-source and free.
It must offer reasonable documentation.
The library must be actively maintained.

Using wget with a proxy

Mon, 27 Sep 2021 11:10:27 +0200

Introduction

In this article, you will examine how to use wget commands, to retrieve or transfer data, with a proxy server. Proxy servers are often referenced as the gateway between you and the world wide web and can make accessing data more secure. Feel free to learn more about proxies here, but let's get started!

Prerequisites & Installation

This article is for a wide range of developers, ✨including you juniors!✨ But to get the most of the material, it is advised to:

Using cURL with a proxy

Wed, 14 Jul 2021 11:10:27 +0200

In this article, you will learn how to use the command line tool cURL to transfer data using a proxy server. A proxy server acts as a middleman between a client and a destination server. The client can forward each command they want to execute to the proxy and then the proxy executes it and returns the result to the client.

We might want to do this when say data on a target service uses geo localization to restrict the data displayed, or completely blocks access to clients in certain countries. On a variety of global shopping sites this approach is used to display prices in a local currency - e.g. Euros rather than dollars. If we were to visit the site directly, we would end up with data in the wrong currency. By using a proxy, we can fetch the data we require based on the locale of the proxy.

How to download a file with Puppeteer?

Fri, 23 Apr 2021 11:10:27 +0200

In this article, we will discuss how to efficiently download files with Puppeteer. Automating file downloads can sometimes be complicated. You perhaps need to explicitly specify a download location, download multiple files at the same time, and so on. Unfortunately, all these use cases are not well documented. That’s why I wrote this article to share some of the tips and tricks that I came up with over the years while working with Puppeteer. We will go through several practical examples and take a deep dive into Puppeteer’s APIs used for file download. Exciting! let’s get started.

Scraping the web with Playwright

Wed, 07 Apr 2021 11:27:59 +0000

Playwright is a browser automation library for Node.js (similar to Selenium or Puppeteer) that allows reliable, fast, and efficient browser automation with a few lines of code. Its simplicity and powerful automation capabilities make it an ideal tool for web scraping and data mining. It also comes with headless browser support (more on headless browsers later on in the article). The biggest difference compared to Puppeteer is its cross-browser support. In this article, we will discuss:

How do I get a title in Cheerio?

Sat, 16 Jan 2021 09:10:27 +0000

You can get a title in Cheerio by using the title as the selector expression and then executing the text() method. Here is some sample code that extracts and prints the title from the ScrapingBee homepage:

const cheerio = require('cheerio');

fetch('https://scrapingbee.com')
 .then(function (response) {
 return response.text();
 })
 .then(function (html) {
 // Load HTML in Cheerio
 const $ = cheerio.load(html);
 
 // Use `title` as a selector and extract
 // the text using the `text()` method
 console.log($('title').text())
 })
 .catch(function (err) {
 console.log('Failed to fetch page: ', err);
 });

How do I get links in Cheerio?

Sat, 16 Jan 2021 09:10:27 +0000

You can get links in Cheerio by using the relevant selector expression and then using the .attr() method to extract the href from the nodes.

Here is some sample code that extracts all the anchor tags from the ScrapingBee homepage and then prints the text and href from the tags in the console:

const cheerio = require('cheerio');

fetch('https://scrapingbee.com')
 .then(function (response) {
 return response.text();
 })
 .then(function (html) {
 // Load the HTML in Cheerio
 const $ = cheerio.load(html);
 
 // Select all anchor tags from the page
 const links = $("a")

 // Loop over all the anchor tags
 links.each((index, value) => {
 // Print the text from the tags and the associated href
 console.log($(value).text(), " => ", $(value).attr("href"));
 })
 })
 .catch(function (err) {
 console.log('Failed to fetch page: ', err);
 });

Is Cheerio faster than Puppeteer?

Sat, 16 Jan 2021 09:10:27 +0000

Cheerio is much faster than Puppeteer. This is because Cheerio is just a DOM parser and helps us traverse raw HTML and XML data. It does not execute any Javascript on the page. On the other hand, Puppeteer runs a full browser and executes all the Javascript, and processes all XHR requests.

You won't be able to observe the speed difference in small projects but it compounds on large projects and becomes very apparent.

What is Cheerio in JavaScript?

Sat, 16 Jan 2021 09:10:27 +0000

Cheerio is a fast, lean implementation of core jQuery. It helps in traversing the DOM using a friendly and familiar API and works both in the browser and the server. It simply parses the HTML and XML and does not execute any Javascript in the document or load any external resources. This makes Cheerio extremely fast when compared to full browser automation tools like Puppeteer and Selenium. However, if a project requires executing Javascript on the page or executing background XHR requests then Cheerio is not the right tool for the job.

Can I use XPath selectors in BeautifulSoup?

Fri, 15 Jan 2021 09:10:27 +0000

What is XPath?

XPath is an expression language designed to support the query or transformation of XML documents. It was defined by the W3C and can be used to navigate through elements and attributes in an XML document.

Can we use XPath with BeautifulSoup?

Technically, no. But we can BeautifulSoup4 with lxml Python library to achieve that.

To install lxml, all you have to do is run this command: pip install lxml, and that's it!

How long does it take to learn web scraping in Python?

Fri, 15 Jan 2021 09:10:27 +0000

Depending on your Python knowledge, and how much time you're allocating to learn this skill, it could take anywhere from two days to two years.

- Generally, it takes about one to six months to learn the fundamentals of Python, that means being able to work with variables, objects & data structures, flow control (conditions & loops), file I/O, functions, classes and basic web scraping tools such as requests library.

How to capture background requests and responses in Puppeteer?

Fri, 15 Jan 2021 09:10:27 +0000

You can use the page.on() function to capture the background requests and responses that go in the background when a request is made.

For example, to capture the background requests of ScrapingBee's home page, you can use this code:

const puppeteer = require('puppeteer')
try {
 (async () => {
 const browser = await puppeteer.launch();
 const page = await browser.newPage();
 var requests = [];
 var responses = [];

 page.on('request', request => {
 requests.push(request);
 });

 page.on('response', response => {
 responses.push(response);
 });
 await page.goto('https://scrapingbee.com');
 await browser.close();
 console.log(requests);
 console.log(responses)
 })()
} catch (err) {
 console.error(err);
}

How to extract data from website using selenium python?

Fri, 15 Jan 2021 09:10:27 +0000

You can use Selenium to scrape data from specific elements of a web page. Let's take the same example from our previous post: How to web scrape with python selenium?

We have used this Python code (with Selenium) to wait for the content to load by adding some waiting time:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time

options = Options()
options.headless = True

driver = webdriver.Chrome(options=options, executable_path="PATH_TO_CHROMEDRIVER") # Setting up the Chrome driver
driver.get("https://demo.scrapingbee.com/content_loads_after_5s.html")
time.sleep(6) # Sleep for 6 seconds
print(driver.page_source)
driver.quit()

And we've had this result:

How to find all links using BeautifulSoup and Python?

Fri, 15 Jan 2021 09:10:27 +0000

You can find all of the links, anchor <a> elements, on a web page by using the find_all function of BeautifulSoup4, with the tag "a" as a parameter for the function.

Here's a sample code to extract all links from ScrapingBee's blog:

import requests
from bs4 import BeautifulSoup

response = requests.get("https://www.scrapingbee.com/blog/")
soup = BeautifulSoup(response.content, 'html.parser')

links = soup.find_all("a") # Find all elements with the tag <a>
for link in links:
 print("Link:", link.get("href"), "Text:", link.string)

How to find elements by CSS selector in Puppeteer?

Fri, 15 Jan 2021 09:10:27 +0000

You can use Puppeteer to find elements using CSS selectors with the page.$() or page.$$() functions.

page.$() returns the first occurence of the CSS selector being used, while page.$$() returns all elements of the page that match the selector.

const puppeteer = require('puppeteer');

(async () => {
 const browser = await puppeteer.launch();
 const page = await browser.newPage();

 // Open Scrapingbee's website
 await page.goto('https://scrapingbee.com');

 // Get the first h1 element using page.$
 let first_h1 = await page.$("h1");

 // Get all p elements using page.$$
 let all_p_elements = await page.$$("p");

 // Get the textContent of the h1 element
 let h1_value = await page.evaluate(el => el.textContent, first_h1)

 // The total number of p elements on the page
 let p_total = await page.evaluate(el => el.length, all_p_elements)

 console.log(h1_value);

 console.log(p_total);

 // Close browser.
 await browser.close();
})();

How to find elements by XPath in Puppeteer

Fri, 15 Jan 2021 09:10:27 +0000

You can also use Puppeteer to find elements with XPath instead of CSS selectors, by using the page.$x() function:

const puppeteer = require('puppeteer');

(async () => {
 const browser = await puppeteer.launch();
 const page = await browser.newPage();

 // Open Scrapingbee's website
 await page.goto('https://scrapingbee.com');

 // Get the first h1 element using page.$x
 let first_h1_element = await page.$x('//*[@id="content"]/div/section[1]/div/div/div[1]/div/h1');

 // Get all p elements using page.$x
 let all_p_elements = await page.$x("//p");

 // Get the textContent of the h1 element
 let h1_value = await page.evaluate(el => el.textContent, first_h1_element[0])

 // The total number of p elements on the page
 let p_total = await page.evaluate(el => el.length, all_p_elements)

 console.log(h1_value);

 console.log(p_total);

 // Close browser.
 await browser.close();
})();

How to find elements without specific attributes in BeautifulSoup?

Fri, 15 Jan 2021 09:10:27 +0000

To find elements without a specific attribute using BeautifulSoup, we use the attrs parameter of the function find, and we specify the attributes as None.

For example, to find the paragraph element without a class name, we set attrs={"class": None}:

import requests
from bs4 import BeautifulSoup

html_content = '''
<p class="clean-text">A very long clean paragraph</p>
<p class="dark-text">A very long dark paragraph</p>
<p>A very long paragraph without attribute</p>
<p class="light-text">A very long light paragraph</p>
'''
soup = BeautifulSoup(html_content, 'html.parser')

no_class_attribute = soup.find("p", attrs={"class": None})

print(no_class_attribute)
# Output: <p>A very long paragraph without attribute</p>

How to find HTML element by class with BeautifulSoup?

Fri, 15 Jan 2021 09:10:27 +0000

To extract HTML elements with a specific class name using BeautifulSoup, we use the attrs parameter of the functions find or find_all.

For example, to extract the element that has mb-[21px] as a class name, we use the function find with attrs={"class": "mb-[21px]"} like this:

import requests
from bs4 import BeautifulSoup

response = requests.get("https://www.scrapingbee.com/blog/")
soup = BeautifulSoup(response.content, 'html.parser')

h1 = soup.find(attrs={"class": "mb-[21px]"})
print(h1.string)
# Output: The ScrapingBee Blog

How to find HTML elements by attribute using BeautifulSoup?

Fri, 15 Jan 2021 09:10:27 +0000

BeautifulSoup can also be used to scrape elements with custom attributes using the attrs parameter for the functions find and find_all.

To extract elements with the attribute data-microtip-size=medium, the tooltips in the pricing table from ScrapingBee's home page, we can set attrs={"data-microtip-size": "medium"}

import requests
from bs4 import BeautifulSoup

response = requests.get("https://www.scrapingbee.com")
soup = BeautifulSoup(response.content, 'html.parser')

tooltips = soup.find_all("button", attrs={"data-microtip-size": "medium"})
for tooltip in tooltips:
 print(tooltip.get("aria-label"))
# Output: API credits are valid for one month, leftovers are not rolled-over to the next month... credits and concurrency.

How to find HTML elements by multiple tags with BeautifulSoup?

Fri, 15 Jan 2021 09:10:27 +0000

BeautifulSoup also supports selecting elements by multiple tags. To achieve that, we use the function find_all, and we send a list of tags we want to extract.

For example, to extract <h1> and <b> elements, we send the tags as a list like this:

from bs4 import BeautifulSoup

html_content = '''
<h1>Header</h1>
<p>Paragraph</p>
<span>Span</p>
<b>Bold</b>
'''
soup = BeautifulSoup(html_content, 'html.parser')

headers_and_bold_text = soup.find_all(["h1", "b"])
for element in headers_and_bold_text:
 print(element)
# Output:
# <h1>Header</h1>
# <b>Bold</b>

How to find sibling HTML nodes using BeautifulSoup and Python?

Fri, 15 Jan 2021 09:10:27 +0000

BeautifulSoup allows us to find sibling elements using 4 main functions:

- find_previous_sibling to find the single previous sibling
- find_next_sibling to find the single next sibling
- find_all_next to find all the next siblings
- find_all_previous to find all previous siblings

You can use the code below to find the previous sibling, next sibling, all next siblings and all previous siblings of the Main Paragraph element:

from bs4 import BeautifulSoup

html_content = '''
<p>First paragraph</p>
<p>Second Paragraph</p>
<p id="main">Main Paragraph</p>
<p>Fourth Paragraph</p>
<p>Fifth Pragaraph</p>
'''
soup = BeautifulSoup(html_content, 'html.parser')

main_element = soup.find("p", attrs={"id": "main"})

# Find the previous sibling:
print(main_element.find_previous_sibling())

# Find the next sibling:
print(main_element.find_next_sibling())

# Find all next siblings:
print(main_element.find_all_next())

# Find all previous siblings:
print(main_element.find_all_previous())

How to save and load cookies in Puppeteer?

Fri, 15 Jan 2021 09:10:27 +0000

Saving and loading cookies with Puppeteer is very straightforward, we can use the page.cookies() method to get all the cookies of a webpage, and use the page.setCookie() method to load cookies into a web page:

const puppeteer = require('puppeteer');

(async () => {

 const browser = await puppeteer.launch();
 const page = await browser.newPage();

 // Open ScrapingBee's URL
 await page.goto('http://scrapingbee.com');

 // Get all the page's cookies and save them to the cookies variable
 const cookies = await page.cookies();

 // Open a second website
 await page.goto('http://httpbin.org/cookies');

 // Load the previously saved cookies
 await page.setCookie(...cookies);

 // Get the second page's cookies
 const cookiesSet = await page.cookies();

 console.log(JSON.stringify(cookiesSet));

 await browser.close();

})();

How to scrape tables with BeautifulSoup?

Fri, 15 Jan 2021 09:10:27 +0000

We can parse a table's content with BeautifulSoup by finding all <tr> elements, and finding their <td> or <th> children.

Here is an example on how to parse this demo table using BeautifulSoup:

import requests
from bs4 import BeautifulSoup

response = requests.get("https://demo.scrapingbee.com/table_content.html")
soup = BeautifulSoup(response.content, 'html.parser')

data = []
table = soup.find('table')
table_body = table.find('tbody')

rows = table.find_all('tr')
for row in rows:
 cols = row.find_all(['td', 'th'])
 cols = [ele.text.strip() for ele in cols]
 data.append([ele for ele in cols if ele])
print(data)

How to select values between two nodes in BeautifulSoup and Python?

Fri, 15 Jan 2021 09:10:27 +0000

You can select elements between two nodes in BeautifulSoup by looping through the main nodes, and checking the next siblings to see if a main node was reached:

from bs4 import BeautifulSoup

html_content = '''
<h1>Starting Header</h1><p>Element 1</p><p>Element 2</p><p>Element 3</p><h1>Ending Header</h1>
'''
soup = BeautifulSoup(html_content, 'html.parser')

elements = []
for tag in soup.find("h1").next_siblings:
 if tag.name == "h1":
 break
 else:
 elements.append(tag)

print(elements)
# Output: [<p>Element 1</p>, <p>Element 2</p>, <p>Element 3</p>]

How to take a screenshot with Puppeteer?

Fri, 15 Jan 2021 09:10:27 +0000

Taking screenshots with Puppeteer is very simple, all you have to do is to set the browser's viewport, then use the page.screenshot() method to capture it.

Here's an example on how to take a screenshot of ScrapingBee's home page:

const puppeteer = require('puppeteer');

(async () => {

 const browser = await puppeteer.launch();
 const page = await browser.newPage();

 // Set the viewport's width and height
 await page.setViewport({ width: 1920, height: 1080 });

 // Open ScrapingBee's home page
 await page.goto('https://scrapingbee.com');

 try {
 // Capture screenshot and save it in the current folder:
 await page.screenshot({ path: `./scrapingbee_homepage.jpg` });

 } catch (err) {
 console.log(`Error: ${err.message}`);
 } finally {
 await browser.close();
 console.log(`Screenshot has been captured successfully`);
 }
})();

How to web scrape with python selenium?

Fri, 15 Jan 2021 09:10:27 +0000

Using Python with Requests library can help you scrape data from static websites, that means websites that have the content within the server's original HTML response. However, you will not be able to get data from websites that load information dynamically, using JavaScript that gets executed after the server's initial response. For that, we will have to use tools that allows us to mimic a typical user's behavior, like Selenium.

HTTP headers with axios

Fri, 15 Jan 2021 11:10:27 +0200

There has been quite a lot of debate for a long time in the Javascript community as to which HTTP client is the best when it comes to ease of use, among them, Axios would definitely rank among the top 3 for a lot of developers. This article will show you how to use axios to make HTTP requests and pass HTTP headers with your requests. We will also take a close look at how HTTP headers work and why they are important.

Is Python good for web scraping?

Fri, 15 Jan 2021 09:10:27 +0000

Short answer: Yes!

Python is one of the most popular programming languages in the world thanks to its ease of use & learn, its large community and its portability. This language also dominates all modern data-related fields, including data analysis, machine learning and web scraping.

Writing a Hello World program in Python is much easier than most other programming languages, especially C-Like languages, here is how you can do that:

Is web scraping good to learn?

Fri, 15 Jan 2021 09:10:27 +0000

Yes!

Web scraping is a very useful skill to have in a world that operates with, and generates data in every second. Data is everywhere, and it is important to acquire the ability to easily extract it from online sources.

Without web scraping knowledge, it would very difficult to amass large amounts of data that can be used for analysis, visualization and prediction.
For example, without tools like Requests and BeautifulSoup, it would be very difficult to scrape Wikipedia's S&P500 historical data. We would have to manually copy and paste each data point from each page, which is very tedious.

However, thanks to these tools, we can easily scrape the historical data in milliseconds using this code:

What does Beautifulsoup do in Python?

Fri, 15 Jan 2021 09:10:27 +0000

BeautifulSoup parses the HTML allowing you to extract information from it.

When doing web scraping, you will usually not be interested in the HTML on the page, but in the underlying data. This is where BeautifulSoup comes into play.

BeautifulSoup will take that HTML and turn it into the data you're interested in. Here is a quick example on how to extract the title of a webpage:

import requests
from bs4 import BeautifulSoup

response = requests.get("https://news.ycombinator.com/")
soup = BeautifulSoup(response.content, 'html.parser')

# The title tag of the page
print(soup.title)
> <title>Hacker News</title>

# The title of the page as string
print(soup.title.string)
> Hacker News

If you want to learn more about BeautifulSoup and how to extract links, custom attributes, siblings and more, feel free to check our BeautifulSoup tutorial.

Which Python library is used for web scraping?

Fri, 15 Jan 2021 09:10:27 +0000

There are various Python libraries that can be used for web scraping, but the most popular ones are:

1. Requests:

Requests is an easy to use HTTP library, it abstracts the complexity of making HTTP/1.1 requests behind a simple API so that you can focus on scraping the web page, and not on the request itself. So this tool will allow you to fetch the HTML/JSON contents of any page.

How to block resources in Playwright and Python?

Thu, 14 Jan 2021 09:10:27 +0000

You can block resources in Playwright by making use of the route method of the Page or Browser object and registering an interceptor that rejects requests based on certain parameters. For instance, you can block all remote resources of image type. You can also filter the URL and block specific URLs.

Here is some sample code that navigates to the ScrapingBee homepage while blocking all images and all URLs containing "google":

How to capture background requests and responses in Playwright?

Thu, 14 Jan 2021 09:10:27 +0000

You can capture background requests and responses in Playwright by registering appropriate callback functions for the request and response events of the Page object.

Here is some sample code that logs all requests and responses in Playwright:

from playwright.sync_api import sync_playwright

def incercept_request(request):
 print("requested URL:", request.url)

def incercept_response(response):
 print(f"response URL: {response.url}, Status: {response.status}")

with sync_playwright() as p:
 browser = p.chromium.launch(headless = False)

 page = browser.new_page()
 
 # Register the middlewares
 page.on("request", incercept_request)
 page.on("response", incercept_response)

 page.goto("https://scrapingbee.com")

Note: You can modify requests and responses via these middlewares as well!

How to download a file with Playwright and Python?

Thu, 14 Jan 2021 09:10:27 +0000

You can download a file with Playwright by targeting the file download button on the page using any Locator and clicking it. Alternatively, you can also extract the link from an anchor tag using the get_attribute method and then download the file using requests. This is better as sometimes the PDFs and other downloadable files will open natively in the browser instead of triggering a download on button click.

Here is some sample code that downloads a random paper from arXiv using Playwright and requests:

How to find elements by CSS selectors in Playwright?

Thu, 14 Jan 2021 09:10:27 +0000

You can find elements by CSS selectors in Playwright by using the locator method of the Page object. Playwright can automatically detect that a CSS selector is being passed in as an argument. Alternatively, you can prepend your CSS selector with css= to make sure Playwright doesn't make a wrong guess.

Here is some sample code that prints the title of ScrapingBee website by making use of CSS selectors:

from playwright import sync_playwright

with sync_playwright() as p:
 browser = p.chromium.launch(headless=False)
 page = browser.new_page()

 page.goto("https://scrapingbee.com")
 
 # Extract the title using CSS selector and print it
 title = page.locator('css=title')
 print(title.text_content())

How to find elements by XPath selectors in Playwright?

Thu, 14 Jan 2021 09:10:27 +0000

You can find elements by XPath selectors in Playwright by using the locator method of the Page object. Playwright can automatically detect that an XPath is being passed as an argument. Alternatively, you can prepend your XPath with xpath= to make sure Playwright doesn't make a wrong guess.

Here is some sample code that prints the title of ScrapingBee website by making use of XPath selectors:

from playwright import sync_playwright

with sync_playwright() as p:
 browser = p.chromium.launch(headless=False)
 page = browser.new_page()

 page.goto("https://scrapingbee.com")
 
 # Extract the title using XPath selector and print it
 title = page.locator('xpath=//title')
 print(title.text_content())

How to load local files in Playwright?

Thu, 14 Jan 2021 09:10:27 +0000

You can load local files in Playwright by passing in the absolute path of the file to the goto method of the Page object. Just make sure that you prepend file:// to the path as well.

Here is some sample code for opening a local file:

from playwright import sync_playwright

with sync_playwright() as p:
 browser = p.chromium.launch(headless=False)
 context = browser.new_context()
 page = context.new_page()

 # open a local file 
 page.goto("file://path/to/file.html")

Note: The path would look like this for windows: file://C:/path/to/file.html

How to run Playwright in Jupyter notebooks?

Thu, 14 Jan 2021 09:10:27 +0000

You can run Playwright in Jupyter notebooks by making use of Playwright's async API. This is required because Jupyter notebooks use an asyncio event loop and you need to use Playwright's async API as well.

Here is some sample code that navigates to ScrapingBee's homepage while making use of the async API:

from playwright.async_api import async_playwright

pw = await async_playwright().start()
browser = await pw.chromium.launch(headless = False)
page = await browser.new_page()

await page.goto("https://scrapingbee.com/")

How to save and load cookies in Playwright?

Thu, 14 Jan 2021 09:10:27 +0000

You can save and load cookies in Playwright by making use of the cookies() and add_cookies() methods of the browser context. The former returns the current cookies whereas the latter helps you add new cookies and/or overwrite the old ones.

Here is some sample code for saving and loading the cookies in Playwright while browsing the ScrapingBee website:

import json
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
 browser = p.chromium.launch(headless = False)
 context = browser.new_context()
 page = context.new_page()
 page.goto("https://scrapingbee.com")
 
 # Save the cookies
 with open("cookies.json", "w") as f:
 f.write(json.dumps(context.cookies()))

 # Load the cookies
 with open("cookies.json", "r") as f:
 cookies = json.loads(f.read())
 context.add_cookies(cookies)

How to take a screenshot with Playwright?

Thu, 14 Jan 2021 09:10:27 +0000

You can take a screenshot with Playwright via the screenshot method of the Page object. You can optionally pass in the full_page boolean argument to the screenshot method to save the screenshot of the whole page.

Here is some sample code that navigates to ScrapingBee's homepage and saves the screenshot in a screenshot.png file:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
 browser = p.chromium.launch(headless = False)

 page = browser.new_page()
 page.goto("https://scrapingbee.com")

 # Save the screenshot
 page.screenshot(path="screenshot.png")

How to block image loading in Selenium?

Tue, 12 Jan 2021 09:10:27 +0000

You can block image loading in Selenium by passing in the custom ChromeOptions object and setting the appropriate content settings preferences.

Here is some sample code that navigates to the ScrapingBee homepage while blocking images:

from selenium import webdriver
from selenium.webdriver.common.by import By

DRIVER_PATH = '/path/to/chromedriver'

# Block images via ChromeOptions object
chrome_options = webdriver.ChromeOptions()
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)

# Pass in custom options while creating a Chrome object
driver = webdriver.Chrome(options=chrome_options, executable_path=DRIVER_PATH)

# Navigate to ScrapingBee while blocking all images
driver.get("http://www.scrapingbee.com")

The code for Firefox looks fairly similar as well:

How to get page source in Selenium?

Tue, 12 Jan 2021 09:10:27 +0000

You can easily get the page source in Selenium via the page_source attribute of the Selenium web driver.

Here is some sample code for getting the page source of the ScrapingBee website:

from selenium import webdriver

DRIVER_PATH = '/path/to/chromedriver'
driver = webdriver.Chrome(executable_path=DRIVER_PATH)

driver.get("http://www.scrapingbee.com")

# Print page source on screen
print(driver.page_source)

How to scroll to an element in Selenium?

Tue, 12 Jan 2021 09:10:27 +0000

You can scroll to an element in Selenium by making use of the execute_script method and passing in a Javascript expression to do the actual scrolling. You can use any supported Selenium selectors to target any WebElement and then pass that to the execute_script as an argument.

Here is some example code that navigates to the ScrapingBee homepage and scrolls to the footer tag:

from selenium import webdriver
from selenium.webdriver.common.by import By

DRIVER_PATH = '/path/to/chromedriver'
driver = webdriver.Chrome(executable_path=DRIVER_PATH)

driver.get("http://www.scrapingbee.com")

# Javascript expression to scroll to a particular element
# arguments[0] refers to the first argument that is later passed
# in to execute_script method
js_code = "arguments[0].scrollIntoView();"

# The WebElement you want to scroll to
element = driver.find_element(By.TAG_NAME, 'footer')

# Execute the JS script
driver.execute_script(js_code, element)

How to take a screenshot with Selenium?

Tue, 12 Jan 2021 09:10:27 +0000

You can take a screenshot using the selenium web driver via the save_screenshot method.

Here is some sample code for navigating to the ScrapingBee website and taking a screenshot of the page:

from selenium import webdriver

DRIVER_PATH = '/path/to/chromedriver'
driver = webdriver.Chrome(executable_path=DRIVER_PATH)

driver.get("http://www.scrapingbee.com")

screenshot_path = "/path/to/screenshot.png"

# Save the screenshot
driver.save_screenshot(screenshot_path)

How to wait for the page to load in Selenium?

Tue, 12 Jan 2021 09:10:27 +0000

You can wait for the page to load in Selenium via multiple strategies:

Explicit wait: Wait until a particular condition is met. For E.g. a particular element becomes visible on the screen
Implicit wait: Wait for a particular time interval
Fluent wait: Similar to explicit wait but provides additional control via timeouts and polling frequency

By default, the web driver waits for the page to load (but not for the AJAX requests initiated with the page load) and you can instruct it to explicitly wait for an element by making use of the WebDriverWait and the expected_conditions module.

Selenium: chromedriver executable needs to be in PATH?

Tue, 12 Jan 2021 09:10:27 +0000

You need to make sure that the chromedriver executable is available in your PATH. Otherwise, Selenium will throw this error:

selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://chromedriver.chromium.org/home

The best way to fix this error is to use the webdriver-manager package. It will make sure that you have a valid chromedriver executable in PATH and if it is not available, it will download it automatically. You can install it using PIP:

Selenium: geckodriver executable needs to be in PATH?

Tue, 12 Jan 2021 09:10:27 +0000

You need to make sure that the geckodriver executable is available in your PATH. Otherwise, Selenium will throw this error:

selenium.common.exceptions.WebDriverException: Message: 'geckodriver' executable needs to be in PATH.

The best way to fix this error is to use the webdriver-manager package. It will make sure that you have a valid geckodriver executable in PATH and if it is not available, it will download it automatically. You can install it using PIP:

How to find elements by XPath in Selenium?

Mon, 11 Jan 2021 09:10:27 +0000

You can find elements by XPath selectors in Selenium by utilizing the find_element and find_elements methods and the By.XPATH argument.

find_element returns the first occurence of the XPath selector being used, while find_elements returns all elements of the page that match the selector. And By.XPATH simply tells Selenium to use the XPATH selector matching method.

Tip: // in XPath matches an element wherever it is on the page whereas / matches a direct child element.

How to save and load cookies in Selenium?

Mon, 11 Jan 2021 09:10:27 +0000

You can save and load cookies in Selenium using the get_cookies method of the web driver object and the pickle library.

Here is some sample code to save and load cookies while navigating to the ScrapingBee website.

import pickle
from selenium import webdriver

DRIVER_PATH = '/path/to/chromedriver'
driver = webdriver.Chrome(executable_path=DRIVER_PATH)

driver.get("http://www.scrapingbee.com")

# Save cookies
pickle.dump( driver.get_cookies() , open("cookies.pkl","wb"))

# Load cookies
cookies = pickle.load(open("cookies.pkl", "rb"))
for cookie in cookies:
 driver.add_cookie(cookie)

Scraping E-Commerce Product Data

Sun, 17 Feb 2019 10:24:37 +0100

In this tutorial, we are going to see how to extract product data from any E-commerce websites with Java. There are lots of different use cases for product data extraction, such as:

E-commerce price monitoring
Price comparator
Availability monitoring
Extracting reviews
Market research
MAP violation

We are going to extract these different fields: Price, Product Name, Image URL, SKU, and currency from this product page:

What you will need

We will use HtmlUnit to perform the HTTP request and parse the DOM, add this dependency to your pom.xml.

Mon, 01 Jan 0001 00:00:00 +0000

If you want to learn web scraping with Elixir, check out our tutorial.

Mon, 01 Jan 0001 00:00:00 +0000

If you want to learn web scraping with Go, check out our tutorial.

You will learn how to build your first web scraper with Go, and how to use the Colly library.

Mon, 01 Jan 0001 00:00:00 +0000

If you want to learn web scraping with Java, check out our tutorials:

Mon, 01 Jan 0001 00:00:00 +0000

Due to the enormous advancements it has seen and the advent of the NodeJS runtime, JavaScript has emerged as one of the most well-liked and often used languages. The necessary tools are now available for JavaScript, whether it's for a web or mobile application.

And of course, web scraping.

To learn more about JavaScript and web scraping checkout our tutorials:

Mon, 01 Jan 0001 00:00:00 +0000

JavaScript has become one of the most well-liked and often used languages as a result of the significant developments it has experienced and the introduction of the NodeJS runtime. Whether it's for a web application or a mobile application, JavaScript now has the required capabilities available.

And of course, web scraping.

To learn more about JavaScript and web scraping checkout our tutorials:

Mon, 01 Jan 0001 00:00:00 +0000

JavaScript has evolved as one of the most popular and widely used languages as a result of tremendous developments and the introduction of the NodeJS runtime. JavaScript development tools are now available, whether for a web or mobile application.

And of course, web scraping.

To learn more about JavaScript and web scraping checkout our tutorials:

Mon, 01 Jan 0001 00:00:00 +0000

JavaScript is known for both its ease of use and its power. With JavaScript it is very easy to create web applications and web services.

And of course, web scraping.

To learn more about JavaScript and web scraping checkout our tutorials:

Mon, 01 Jan 0001 00:00:00 +0000

We have written a full PHP web scraping tutorial, check it out.

Mon, 01 Jan 0001 00:00:00 +0000

Python is a versatile and trending programming language. It is used for web scraping, data analysis, and much more.

If you want to learn more about web scraping in Python check out our tutorials:

Mon, 01 Jan 0001 00:00:00 +0000

If you want to start doing web scraping with R, you can read our tutorial: R and Web Scraping.

Mon, 01 Jan 0001 00:00:00 +0000

If you're starting with Rust and web scraping, you can read our tutorial: Rust and Web Scraping.

Mon, 01 Jan 0001 00:00:00 +0000

ScrapingBee, the best web scraping API.

The Web Scraping API for Busy Developers

Our Web Scraping API handles headless browsers and rotates proxies for you.

99Acres Scraper API - Easy Use & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Acceptable Use Policy

Mon, 01 Jan 0001 00:00:00 +0000

The present Acceptable Use Policy (the “AUP”) covers the Services provided under legitimate and legal purposes only and any ongoing Agreement. Capitalized terms in this AUP have the same meaning as in the General Conditions in which they are defined.

The AUP intends to protect Provider, Users, and more generally internet users from illegal, fraudulent, or abusive activities. As such, any access or use of the Services for illegal, fraudulent, or abusive activities is strictly prohibited. Any such suspected access or use will be investigated.

Adidas Scraper API - Easy Sign Up + Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Affiliate Program

Mon, 01 Jan 0001 00:00:00 +0000

Earn commissions by promoting ScrapingBee

Welcome to the ScrapingBee affiliate program. ScrapingBee is a web scraping API. We help developers and tech companies scrape the web without having to deal with rotating proxies and headless browsers.

We are a Software as a Service company, meaning our customers pay us a monthly fee to access the service. The price depends on the volume, and we have three tiers: $29 / $99 / $249 per month.

AI Web Scraping API

Mon, 01 Jan 0001 00:00:00 +0000

AI Web Scraping API

Effortlessly extract data with our AI scraper API. Simplify data extraction, get clean JSON outputs and adapt to page changes. Try it free today!

Airbnb Scraper API - Quick Signup + Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Alibaba Scraper API Tool - Free Credits & Easy Setup

Mon, 01 Jan 0001 00:00:00 +0000

AliExpress Scraper API with Credits - Easy & Simple Tool

Mon, 01 Jan 0001 00:00:00 +0000

Amazon API

Mon, 01 Jan 0001 00:00:00 +0000

Our Amazon API allows you to scrape Amazon search results and product details in realtime.

We provide two endpoints:

Search endpoint (/api/v1/amazon/search) - Fetch Amazon search results
Product endpoint (/api/v1/amazon/product) - Fetch structured Amazon product details

Amazon Product API

Quick start

To scrape Amazon product details, you only need two things:

your API key, available here
a product ASIN (learn more about ASIN)

Then, simply do this.

Copy

curl "https://app.scrapingbee.com/api/v1/amazon/product?api_key=YOUR-API-KEY&query=B0DPDRNSXV"

# Install the Python Requests library:
# pip install requests
import requests

def send_request():
 response = requests.get(
 url='https://app.scrapingbee.com/api/v1/amazon/product',
 params={
 'api_key': 'YOUR-API-KEY',
 'query': 'B0DPDRNSXV',
 },

 )
 print('Response HTTP Status Code: ', response.status_code)
 print('Response HTTP Response Body: ', response.content)
send_request()

// Install the Node Axios package
// npm install axios
const axios = require('axios');

axios.get('https://app.scrapingbee.com/api/v1/amazon/product', {
 params: {
 'api_key': 'YOUR-API-KEY',
 'url': 'YOUR-URL',
 'query': B0DPDRNSXV,
 }
}).then(function (response) {
 // handle success
 console.log(response);
})

import java.io.IOException;
import org.apache.http.client.fluent.*;

public class SendRequest
{
 public static void main(String[] args) {
 sendRequest();
 }

 private static void sendRequest() {

 // Classic (GET )
 try {

 // Create request
 
 Content content = Request.Get("https://app.scrapingbee.com/api/v1/amazon/product?api_key=YOUR-API-KEY&url=YOUR-URL&query=B0DPDRNSXV")

 // Fetch request and return content
 .execute().returnContent();

 // Print content
 System.out.println(content);
 }
 catch (IOException e) { System.out.println(e); }
 }
}

require 'net/http'
require 'net/https'

# Classic (GET )
def send_request 
 uri = URI('https://app.scrapingbee.com/api/v1/amazon/product?api_key=YOUR-API-KEY&url=YOUR-URL&query=B0DPDRNSXV')

 # Create client
 http = Net::HTTP.new(uri.host, uri.port)
 http.use_ssl = true
 http.verify_mode = OpenSSL::SSL::VERIFY_PEER

 # Create Request
 req = Net::HTTP::Get.new(uri)

 # Fetch Request
 res = http.request(req)
 puts "Response HTTP Status Code: #{ res.code }"
 puts "Response HTTP Response Body: #{ res.body }"
rescue StandardError => e
 puts "HTTP Request failed (#{ e.message })"
end

send_request()

<?php

// get cURL resource
$ch = curl_init();

// set url 
curl_setopt($ch, CURLOPT_URL, 'https://app.scrapingbee.com/api/v1/amazon/product?api_key=YOUR-API-KEY&url=YOUR-URL&query=B0DPDRNSXV');

// set method
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');

// return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);



// send the request and save response to $response
$response = curl_exec($ch);

// stop if fails
if (!$response) {
 die('Error: "' . curl_error($ch) . '" - Code: ' . curl_errno($ch));
}

echo 'HTTP Status Code: ' . curl_getinfo($ch, CURLINFO_HTTP_CODE) . PHP_EOL;
echo 'Response Body: ' . $response . PHP_EOL;

// close curl resource to free up system resources
curl_close($ch);
>

package main

import (
	"fmt"
	"io/ioutil"
	"net/http"
)

func sendClassic() {
	// Create client
	client := &http.Client{}

	// Create request 
	req, err := http.NewRequest("GET", "https://app.scrapingbee.com/api/v1/amazon/product?api_key=YOUR-API-KEY&url=YOUR-URL&query=B0DPDRNSXV", nil)


	parseFormErr := req.ParseForm()
	if parseFormErr != nil {
		fmt.Println(parseFormErr)
	}

	// Fetch Request
	resp, err := client.Do(req)

	if err != nil {
		fmt.Println("Failure : ", err)
	}

	// Read Response Body
	respBody, _ := ioutil.ReadAll(resp.Body)

	// Display Results
	fmt.Println("response Status : ", resp.Status)
	fmt.Println("response Headers : ", resp.Header)
	fmt.Println("response Body : ", string(respBody))
}

func main() {
 sendClassic()
}

Here is a breakdown of all the parameters you can use with the Amazon Product API:

Amazon API

Mon, 01 Jan 0001 00:00:00 +0000

Amazon API

Get Structured JSON for Amazon products, reviews, pricing and more in a single API call.

Amazon ASIN Scraper API Tool - Free Credits, Simple Signup

Mon, 01 Jan 0001 00:00:00 +0000

Amazon Keyword Scraper API - Free Credits & Easy Use

Mon, 01 Jan 0001 00:00:00 +0000

Amazon Review Scraper with Free Credits - Easy to Use Tool

Mon, 01 Jan 0001 00:00:00 +0000

Amazon Scraper API

Mon, 01 Jan 0001 00:00:00 +0000

Amazon Scraper API

Scrape Amazon product data worldwide with our powerful web scraping API. Get prices, reviews, and rankings from any Amazon domain - all with a single API call.

Apartments.com Scraper API Tool - Free Credits on Sign Up

Mon, 01 Jan 0001 00:00:00 +0000

Apify alternative for web scraping?

Mon, 01 Jan 0001 00:00:00 +0000

Apify alternative for web scraping?

ScrapingBee is a better alternative to Apify. Looking for more flexibility, better pricing, and developer-friendly features?

Try ScrapingBee for Free

based on 100+ reviews.

No marketplace. No "actors." Just clean, efficient scraping.

Apify's approach adds layers of abstraction and complexity. We believe in giving developers direct access to scraping functionality through a clear, RESTful API.

Apple App Store Scraper API - Sign Up for Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

ASOS Scraper API - Free Credits on Signup

Mon, 01 Jan 0001 00:00:00 +0000

Autotrader Scraper API - Free Sign Up + Credits

Mon, 01 Jan 0001 00:00:00 +0000

AWS Scraper API - Free Signup & Get Credits

Mon, 01 Jan 0001 00:00:00 +0000

Baidu Search Scraper API - Get Free Credits on Sign Up

Mon, 01 Jan 0001 00:00:00 +0000

BBB Scraper API with Free Credits - Reliable Data Extraction

Mon, 01 Jan 0001 00:00:00 +0000

Best Buy Web Scraper API - Simple Use & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Bing Ads Scraper API - Easy Signup, Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Bing Images Scraper API - Free Credits & Hassle-Free Setup

Mon, 01 Jan 0001 00:00:00 +0000

Bing Maps Scraper API - Signup for Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Bing News Scraper API - Simple Signup, Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Bing Recipes Scraper API - Simple & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Bing Related Searches Scraper API - Sign Up for Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Bing Search Scraper API - Sign Up for Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Bing Spell Check Scraper API - Free Credits Available

Mon, 01 Jan 0001 00:00:00 +0000

Bing Videos Scraper API - Free Credits on Sign Up

Mon, 01 Jan 0001 00:00:00 +0000

Bloomberg Scraper API - Sign Up for Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Booking.com Scraper API - Free Credits on Sign Up

Mon, 01 Jan 0001 00:00:00 +0000

Bright Data alternative for web scraping?

Mon, 01 Jan 0001 00:00:00 +0000

Bright Data alternative for web scraping?

ScrapingBee is a better alternative to Bright Data. Getting structured data from the web should be fast, reliable, and scalable.

Try ScrapingBee for Free

based on 100+ reviews.

Enterprise-grade features. Without enterprise-grade headaches.

Bright Data is powerful—but complex, expensive, and overkill for most. ScrapingBee delivers what you need without the overhead.

Browse AI alternative for web scraping?

Mon, 01 Jan 0001 00:00:00 +0000

Browse AI alternative for web scraping?

ScrapingBee is a better alternative to Browse AI. Powerful scraping doesn't have to come with hidden fees or steep learning curves.

Try ScrapingBee for Free

based on 100+ reviews.

No robots. No waiting. Just raw scraping speed.

Browse AI works well for beginners, but if you're running real-time scraping at scale, you need something more. That's where an API-first approach wins.

Car Rental Data Scraper API - Free Signup and Credits

Mon, 01 Jan 0001 00:00:00 +0000

ChatGPT API

Mon, 01 Jan 0001 00:00:00 +0000

ChatGPT API

Generate AI-powered text responses with GPT-4o in a single API call, with optional web search capabilities.

ChatGPT Scraper API

Mon, 01 Jan 0001 00:00:00 +0000

ChatGPT Scraper API

Scrape ChatGPT responses automatically with our powerful ChatGPTscraping API. Scrape ChatGPT at scale and recieve structured JSON output, allowing you to extract text for training your AI models.

Chewy Scraper API - Free Credits on Sign Up

Mon, 01 Jan 0001 00:00:00 +0000

Cloudflare Scraper API - Simple Use & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Contact Scraper API - Free Credits, Simple & Reliable Tool

Mon, 01 Jan 0001 00:00:00 +0000

Cookie Policy

Mon, 01 Jan 0001 00:00:00 +0000

1. Information and transparency

VostokInc respects the privacy of its Users. This Cookies Policy applies to the Cookies used on the Website. It describes the information We collect automatically through the use of automated information gathering tools such as cookies and web beacons.

Terms not otherwise defined herein shall have the meaning as set forth in the Privacy Policy.

“Cookies” or “Tracers” means tracers that can be deposited or read, for example, when consulting a website, a mobile applicable, or when setting up or using a software. A cookie may include:

Costco Scraping API

Mon, 01 Jan 0001 00:00:00 +0000

Costco Scraping API

Scrape Costco product details and wholesale product data with our specialized scraping API. Get prices, specifications, and product features with perfect unmatched reliability.

Craigslist Scraper API - Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Crawlbase alternative for web scraping?

Mon, 01 Jan 0001 00:00:00 +0000

Crawlbase alternative for web scraping?

ScrapingBee is a better alternative to Crawlbase. When it comes to scalable and robust data scraping, there are more efficient alternatives that won’t break the bank.

Try ScrapingBee for Free

based on 100+ reviews.

Scraping shouldn't be tied to complicated setups.

Crawlbase offers advanced features but at a high cost. Get access to all the scraping features you need, without the complexity.

Crawlera alternative for web scraping?

Mon, 01 Jan 0001 00:00:00 +0000

Crawlera alternative for web scraping?

ScrapingBee is a better alternative to Crawlera. Avoid paying exorbitant rates for your web scraping.

Try ScrapingBee for Free

based on 100+ reviews.

Simple API, powerful features!

Compared to Crawlera's complex usage, ScrapingBee easy-to-use API allows you to quickly get-up and running!

Crexi Scraper API - Simple Access & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Crunchbase Scraper API Tool - Free Credits & Easy Setup

Mon, 01 Jan 0001 00:00:00 +0000

Data Analysis Immobiliare API - Simple Use & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Data Extraction

Mon, 01 Jan 0001 00:00:00 +0000

Basic usage

If you want to extract data from pages and don't want to parse the HTML on your side, you can add extraction rules to your API call.

The simplest way to use extraction rules is to use the following format

Data extraction in Go

Mon, 01 Jan 0001 00:00:00 +0000

One of the most important features of ScrapingBee, is the ability to extract exact data without need to post-process the request’s content using external libraries.

We can use this feature by specifying an additional parameter with the name extract_rules. We specify the label of elements we want to extract, their CSS Selectors and ScrapingBee will do the rest!

Let’s say that we want to extract the title & the subtitle of the data extraction documentation page. Their CSS selectors are h1 and span.text-[20px] respectively. To make sure that they’re the correct ones, you can use the JavaScript function: document.querySelector("CSS_SELECTOR") in that page’s developer tool’s console.

Data extraction in NodeJS

Mon, 01 Jan 0001 00:00:00 +0000

One of the most important features of ScrapingBee, is the ability to extract exact data without need to post-process the request’s content using external libraries.

We can use this feature by specifying an additional parameter with the name extract_rules. We specify the label of elements we want to extract, their CSS Selectors and ScrapingBee will do the rest!

Data extraction in PHP

Mon, 01 Jan 0001 00:00:00 +0000

One of the most important features of ScrapingBee, is the ability to extract exact data without need to post-process the request’s content using external libraries.

We can use this feature by specifying an additional parameter with the name extract_rules. We specify the label of elements we want to extract, their CSS Selectors and ScrapingBee will do the rest!

Data extraction in Python

Mon, 01 Jan 0001 00:00:00 +0000

One of the most important features of ScrapingBee, is the ability to extract exact data without need to post-process the request’s content using external libraries.

We can use this feature by specifying an additional parameter with the name extract_rules. We specify the label of elements we want to extract, their CSS Selectors and ScrapingBee will do the rest!

Data extraction in Ruby

Mon, 01 Jan 0001 00:00:00 +0000

One of the most important features of ScrapingBee, is the ability to extract exact data without need to post-process the request’s content using external libraries.

We can use this feature by specifying an additional parameter with the name extract_rules. We specify the label of elements we want to extract, their CSS Selectors and ScrapingBee will do the rest!

Data Protection / Data processing agreement

Mon, 01 Jan 0001 00:00:00 +0000

Data Processing Agreement

The present Data Processing Agreement (“DPA”) reflects the Parties’ agreement with respect to the terms governing the Processing of Personal Data under the Agreement.

1. Definitions

The term of this DPA shall follow the term of the Agreement. Terms not otherwise defined herein shall have the meaning as set forth in the Agreement. The DPA is part of the Agreement.

2. Purpose of the DPA

The purpose of this Agreement is to set out the relevant legislation and to describe the steps the Provider is taking to ensure its compliance with the Data Privacy Regulation.

Data Protection / GDPR Notice

Mon, 01 Jan 0001 00:00:00 +0000

The General Data Protection Regulation (GDPR) is European Union legislation to strengthen and unify data protection laws for all individuals within the European Union. The regulation came into effect from May 25th, 2018.

As a French business, founded and run by French citizens, but also as people who value privacy, we are fully committed to being compliant with GDPR and all data protection best practices.

This page lays out our commitment to data protection and makes transparent what data we store about our users.

Data Scraping API

Mon, 01 Jan 0001 00:00:00 +0000

Data Scraping API

Extracting data has never been more simple with CSS or XPATH selectors and ScrapingBee.

Decodo alternative for web scraping?

Mon, 01 Jan 0001 00:00:00 +0000

Decodo alternative for web scraping?

ScrapingBee is a better alternative to Decodo. Looking for a better balance of pricing, speed, and features? It’s time to explore the alternatives to Decodo.

Try ScrapingBee for Free

based on 100+ reviews.

No unnecessary steps. Just clean scraping.

Decodo offers scraping services but complicates things with additional features. We make it easy—scrape the web without all the extra fluff.

Depop Scraper API - Easy Use & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Diffbot alternative for web scraping?

Mon, 01 Jan 0001 00:00:00 +0000

Diffbot alternative for web scraping?

ScrapingBee is a better alternative to Diffbot. Efficient and accurate data extraction doesn’t have to come with a hefty price tag or a steep learning curve.

Try ScrapingBee for Free

based on 100+ reviews.

Not just structured data. Just straightforward scraping.

Diffbot is great for extracting structured data but can get expensive quickly. Why pay more for specific use cases when you can scrape the entire web with better pricing?

Direct Answer Box Scraper API - Free Signup & Simplicity

Mon, 01 Jan 0001 00:00:00 +0000

DuckDuckGo Maps Scraper API - Simplicity & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

DuckDuckGo News Scraper API - Effortless Signup, Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

DuckDuckGo Related Searches Scraper API With Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

DuckDuckGo Search Scraper API - Simple Signup Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

eBay Related Searches Scraper API - Get Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

eBay Scraper Tool - Free Credits on Signup

Mon, 01 Jan 0001 00:00:00 +0000

Ecommerce Scraping Tool - Free Credits & Easy API Setup

Mon, 01 Jan 0001 00:00:00 +0000

Etsy Scraper API - Sign Up for Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Expedia Scraper API

Mon, 01 Jan 0001 00:00:00 +0000

Expedia Scraper API

Scrape global hotel data, pricing and details with our scraping API. Get rates, reviews, and property information from any destination with perfect accuracy.

Expireddomains Scraper API - Easy Access & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Fiverr Scraper API - Easy Access & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Flight Scraper API Tool - Easy Setup with Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Flipkart Scraper API - Free Signup Credits Offer

Mon, 01 Jan 0001 00:00:00 +0000

Food Data Scraper API - Free Signup and Credits

Mon, 01 Jan 0001 00:00:00 +0000

Football News API - Simple Use & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Forbes Scraper API - Simple Use & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Fox News Scraper API - Simple Use & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Free Indeed Scraper API with Credits - Easy Data Extraction

Mon, 01 Jan 0001 00:00:00 +0000

Freelancer Scraper API - Simple Use & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Frequently Asked Questions - ScrapingBee

Mon, 01 Jan 0001 00:00:00 +0000

Funda Scraper API - Easy Access & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

G2 Scraper API Tool - Starting is Simple with Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Gamestop Scraper API - Simple Signup Credits Free

Mon, 01 Jan 0001 00:00:00 +0000

Gasbuddy Scraper API - Easy Access & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

GENERAL TERMS AND CONDITIONS OF SERVICE

Mon, 01 Jan 0001 00:00:00 +0000

1. Preamble

VostokInc, a joint-stock company (“société par actions simplifiée”) with registered address located at 66 Avenue des Champs Élysées – 75008 Paris and registered before the Company House of Paris under number 843 352 683 ("VostokInc" or the "Provider") has developed an online solution available at https://app.scrapingbee.com and/or at any other address, application, or location designated by VostokInc (the "API" or “ScrapingBee”) providing web scraping services (the "Services").

The present terms and conditions of service (the "General Conditions") govern the contractual relationship between VostokInc and any natural person aged at least 18 years old with full and complete legal capacity acting in the scope of their professional activity or being the legal representative of a legal entity empowered to enter into legally binding commitments which access the Services only for their professional activities whatever the conditions from whichever terminal, nature, and extent of the subscription to the Services (hereinafter the “User”). User acknowledges and accepts that Services are dedicated to professional activities and as such consumer law is not intended to be applicable. The General Conditions, the Data Processing Agreement, the AUP, and their exhibits form altogether the “Agreement”.

Getty Images Scraper API - Easy Start & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Getyourguide Scraper API - Simple Use & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

GitHub Scraper API - Easy Setup & Free Credits on Sign Up

Mon, 01 Jan 0001 00:00:00 +0000

Glassdoor Jobs Scraper API - Get Free Credits Upon Signup

Mon, 01 Jan 0001 00:00:00 +0000

Goodreads Scraper API - Get Free Credits Now

Mon, 01 Jan 0001 00:00:00 +0000

Google Ads Scraper API - Signup for Credits Free

Mon, 01 Jan 0001 00:00:00 +0000

Google AI Overview Scraper API - Free Signup Credits Offer

Mon, 01 Jan 0001 00:00:00 +0000

Google API

Mon, 01 Jan 0001 00:00:00 +0000

name [type] (default)

Description

api_key [string] required

Your api key

Learn more

search [string] required

The text you would put in the Google search bar

Learn more

add_html [boolean] (false)

Adding the full html of the page in the results

Learn more

country_code [string] ("us")

Country code from which you would like the request to come from

Learn more

device ["desktop" | "mobile"] ("desktop")

Control the device the request will be sent from

Learn more

extra_params [string] ("")

Extra Google URL parameters

Learn more

language [string] ("en")

Language the search results will be displayed in

Learn more

light_request [boolean] (true)

Light requests are faster and cheaper (10 credits instead of 15), but some content may be missing.

Learn more

nfpr [boolean] (false)

Exclude results from auto-corrected queries that were spelt wrong.

Learn more

page [integer] (1)

The page number you want to extract results from

Learn more

The type of search you want to perform

Learn more

Getting Started

Our Google Search API allows you to scrape search results pages in realtime.

Google Autocomplete Scraper API - Free Signup & Credits

Mon, 01 Jan 0001 00:00:00 +0000

Google Books Scraper API - Simple Start & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Google Events Scraper API - Free Signup & Simplified Process

Mon, 01 Jan 0001 00:00:00 +0000

Google Finance Scraper - Free Signup Credits, Simple Use

Mon, 01 Jan 0001 00:00:00 +0000

Google Flights Scraper - Free Signup Credits, Simple Use

Mon, 01 Jan 0001 00:00:00 +0000

Google Hotels Scraper - Simple Tool, Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Google Image Scraper - Free Credits, Simple Signup

Mon, 01 Jan 0001 00:00:00 +0000

Google Jobs Scraper API

Mon, 01 Jan 0001 00:00:00 +0000

Google Jobs Scraper API

Scrape Google Jobs listings from any location with our powerful and real-time API. Get detailed job listings data with a near 100% success rate. Start with 1000 free API credits.

Google Knowledge Graph Scraper API - Start With Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Google Lens Scraper API - Streamlined Access Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Google Maps Autocomplete API - Simple & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Google Maps Business Scraper API - Easy Use & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Google Maps Data Scraper API - Easy Start & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Google Maps Directions API - Easy Start & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Google Maps Distance Matrix API - Simple Use & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Google Maps Email Scraper API - Simple Start & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Google Maps Geolocation API - Simple Start & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Google Maps Lead Scraper API - Easy Use & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Google Maps Places Scraper API - Easy Start & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Google Maps Review Scraper API - Free Credits & Easy Signup

Mon, 01 Jan 0001 00:00:00 +0000

Google Maps Scraper - Easy Use & Free Credits Signup

Mon, 01 Jan 0001 00:00:00 +0000

Google My Business Scraper API - Easy Start & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Google News Scraper API

Mon, 01 Jan 0001 00:00:00 +0000

Google News Scraper API

Get to the latest headlines effortlessly with our powerful and reliable Google News Scraper API. Monitor stories, sources, and authors from any country with unmatched precision and reliability.

Google Patents Scraper API - Free Signup & Simplified Access

Mon, 01 Jan 0001 00:00:00 +0000

Google Play Movies Scraper API - Simplified Access, Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Google Play Scraper API

Mon, 01 Jan 0001 00:00:00 +0000

Google Play Scraper API

Scrape Google Play Store app data at scale with our reliable web scraping API. Get ratings, reviews, and download stats with a single API call.

Google Popular Times API - Simple Use & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Google Product Scraper API - Free Access & Setup

Mon, 01 Jan 0001 00:00:00 +0000

Google Rank Tracking API - Simple Use & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Google Related Questions Scraper API - Free Credits on SignUp

Mon, 01 Jan 0001 00:00:00 +0000

Google Related Searches Scraper API - Easy to Use

Mon, 01 Jan 0001 00:00:00 +0000

Google Reverse Image Scraper API - Free Credits on Signup

Mon, 01 Jan 0001 00:00:00 +0000

Google Reviews Results Scraper API - Free Signup & Credits

Mon, 01 Jan 0001 00:00:00 +0000

Google Scholar Scraper - Free Signup Credits & Easy Use

Mon, 01 Jan 0001 00:00:00 +0000

Google Search Results API

Mon, 01 Jan 0001 00:00:00 +0000

Google Search Results API

Get Structured JSON for search, news, maps, ads and more in a single API call.

Google Shopping Scraper API

Mon, 01 Jan 0001 00:00:00 +0000

Google Shopping Scraper API

Scrape Google Shopping results, allowing you to transform it into a competitive advantage. Real-time pricing data, product tracking, global market coverage — all with a single API call.

Google Showtimes Result Scraper API - Get Started Today

Mon, 01 Jan 0001 00:00:00 +0000

Google Spell Check Scraper API - Free Credits on Sign Up

Mon, 01 Jan 0001 00:00:00 +0000

Google Sports Results Scraper API - Free Credits on SignUp

Mon, 01 Jan 0001 00:00:00 +0000

Google Trends - Trending Now Scraper - Free Credits on Signup

Mon, 01 Jan 0001 00:00:00 +0000

Google Trends Scraper - Free Credits, Simple Signup

Mon, 01 Jan 0001 00:00:00 +0000

Google Weather Scraper API - Simple Start & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

GPT API

Mon, 01 Jan 0001 00:00:00 +0000

Our Chat GPT API allows you to send prompts to a GPT model and receive AI-generated responses in realtime.

We provide one endpoint:

GPT endpoint (/api/v1/chatgpt) - Send prompts to GPT and receive AI-generated responses

Quick start

To use the GPT API, you only need two things:

your API key, available here
a prompt to send to the GPT model (learn more about prompts)

Then, simply do this.

Copy

curl "https://app.scrapingbee.com/api/v1/chatgpt?api_key=YOUR-API-KEY&prompt=Explain+the+benefits+of+renewable+energy+in+100+words"

# Install the Python Requests library:
# pip install requests
import requests

def send_request():
 response = requests.get(
 url='https://app.scrapingbee.com/api/v1/chatgpt',
 params={
 'api_key': 'YOUR-API-KEY',
 'prompt': 'Explain the benefits of renewable energy in 100 words',
 },

 )
 print('Response HTTP Status Code: ', response.status_code)
 print('Response HTTP Response Body: ', response.content)
send_request()

// Install the Node Axios package
// npm install axios
const axios = require('axios');

axios.get('https://app.scrapingbee.com/api/v1/chatgpt', {
 params: {
 'api_key': 'YOUR-API-KEY',
 'url': 'YOUR-URL',
 'prompt': Explain the benefits of renewable energy in 100 words,
 }
}).then(function (response) {
 // handle success
 console.log(response);
})

import java.io.IOException;
import org.apache.http.client.fluent.*;

public class SendRequest
{
 public static void main(String[] args) {
 sendRequest();
 }

 private static void sendRequest() {

 // Classic (GET )
 try {

 // Create request
 
 Content content = Request.Get("https://app.scrapingbee.com/api/v1/chatgpt?api_key=YOUR-API-KEY&url=YOUR-URL&prompt=Explain+the+benefits+of+renewable+energy+in+100+words")

 // Fetch request and return content
 .execute().returnContent();

 // Print content
 System.out.println(content);
 }
 catch (IOException e) { System.out.println(e); }
 }
}

require 'net/http'
require 'net/https'

# Classic (GET )
def send_request 
 uri = URI('https://app.scrapingbee.com/api/v1/chatgpt?api_key=YOUR-API-KEY&url=YOUR-URL&prompt=Explain+the+benefits+of+renewable+energy+in+100+words')

 # Create client
 http = Net::HTTP.new(uri.host, uri.port)
 http.use_ssl = true
 http.verify_mode = OpenSSL::SSL::VERIFY_PEER

 # Create Request
 req = Net::HTTP::Get.new(uri)

 # Fetch Request
 res = http.request(req)
 puts "Response HTTP Status Code: #{ res.code }"
 puts "Response HTTP Response Body: #{ res.body }"
rescue StandardError => e
 puts "HTTP Request failed (#{ e.message })"
end

send_request()

<?php

// get cURL resource
$ch = curl_init();

// set url 
curl_setopt($ch, CURLOPT_URL, 'https://app.scrapingbee.com/api/v1/chatgpt?api_key=YOUR-API-KEY&url=YOUR-URL&prompt=Explain+the+benefits+of+renewable+energy+in+100+words');

// set method
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');

// return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);



// send the request and save response to $response
$response = curl_exec($ch);

// stop if fails
if (!$response) {
 die('Error: "' . curl_error($ch) . '" - Code: ' . curl_errno($ch));
}

echo 'HTTP Status Code: ' . curl_getinfo($ch, CURLINFO_HTTP_CODE) . PHP_EOL;
echo 'Response Body: ' . $response . PHP_EOL;

// close curl resource to free up system resources
curl_close($ch);
>

package main

import (
	"fmt"
	"io/ioutil"
	"net/http"
)

func sendClassic() {
	// Create client
	client := &http.Client{}

	// Create request 
	req, err := http.NewRequest("GET", "https://app.scrapingbee.com/api/v1/chatgpt?api_key=YOUR-API-KEY&url=YOUR-URL&prompt=Explain+the+benefits+of+renewable+energy+in+100+words", nil)


	parseFormErr := req.ParseForm()
	if parseFormErr != nil {
		fmt.Println(parseFormErr)
	}

	// Fetch Request
	resp, err := client.Do(req)

	if err != nil {
		fmt.Println("Failure : ", err)
	}

	// Read Response Body
	respBody, _ := ioutil.ReadAll(resp.Body)

	// Display Results
	fmt.Println("response Status : ", resp.Status)
	fmt.Println("response Headers : ", resp.Header)
	fmt.Println("response Body : ", string(respBody))
}

func main() {
 sendClassic()
}

Here is a breakdown of all the parameters you can use with the GPT API:

Grocery Data Scraper API - Free Signup and Credits

Mon, 01 Jan 0001 00:00:00 +0000

Guardian Scraper API - Simple Start & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Gumroad Scraper API - Easy Start & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Gumtree Scraper API - Sign Up for Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

H&M Scraper API - Easy Access & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Home Depot Scraper API

Mon, 01 Jan 0001 00:00:00 +0000

Home Depot Scraper API

Access Home Depot's vast product catalog with our powerful web scraping API. Get prices, specifications, and availability across product categories with unmatched reliability.

How to extract a table's content in NodeJS

Mon, 01 Jan 0001 00:00:00 +0000

Data can be found online in various formats, but the most popular one is table format, especially that it displays information in a very structured and well organized layout. So it is very important to be able to extract data from tables with ease.

And this is of the most important features of ScrapingBee's data extraction tool, you can scrape data from tables without having to do any post-processing of the HTML response. We can use this feature by specifying a table's CSS selector within a set of extract_rules, and let ScrapingBee do the rest!

How to extract a table's content in Python

Mon, 01 Jan 0001 00:00:00 +0000

How to extract a table's content in Ruby

Mon, 01 Jan 0001 00:00:00 +0000

How to extract CSS selectors using Chrome

Mon, 01 Jan 0001 00:00:00 +0000

Finding the CSS selector of an element you want to scrape can be tricky at times. This is why we can use the Inspect Element feature in most modern browsers to extract the selector with ease.

The process is very simple, first we find the element, right click on it, we then click on Inspect Element. The developer tools window will show up with the element highlighted. We then right click on the selected HTML code, go to Copy, and click on Copy selector.

How to handle infinite scroll pages in Go

Mon, 01 Jan 0001 00:00:00 +0000

Nowadays, most websites use different methods and techniques to decrease the load and data served to their clients’ devices. One of these techniques is the infinite scroll.

In this tutorial, we will see how we can scrape infinite scroll web pages using a js_scenario, specifically the& scroll_y and scroll_x features. And we will use this page as a demo. Only 9 boxes are loaded when we first open the page, but as soon as we scroll to the end of it, we will load 9 more, and that will keep happening each time we scroll to the bottom of the page.

How to handle infinite scroll pages in NodeJS

Mon, 01 Jan 0001 00:00:00 +0000

Nowadays, most websites use different methods and techniques to decrease the load and data served to their clients’ devices. One of these techniques is the infinite scroll.

How to handle infinite scroll pages in PHP

Mon, 01 Jan 0001 00:00:00 +0000

Nowadays, most websites use different methods and techniques to decrease the load and data served to their clients’ devices. One of these techniques is the infinite scroll.

How to handle infinite scroll pages in Python

Mon, 01 Jan 0001 00:00:00 +0000

Nowadays, most websites use different methods and techniques to decrease the load and data served to their clients’ devices. One of these techniques is the infinite scroll.

How to handle infinite scroll pages in Ruby

Mon, 01 Jan 0001 00:00:00 +0000

Nowadays, most websites use different methods and techniques to decrease the load and data served to their clients’ devices. One of these techniques is the infinite scroll.

How to log in to a website using ScrapingBee with NodeJS

Mon, 01 Jan 0001 00:00:00 +0000

In this tutorial, we will see how you can log in to any website, using three different methods:

A JavaScript scenario.
A POST request.
Cookies

As an example, we’re going to log into this demo website and take a screenshot of the account page. So make sure to create an account there before you start!

1. Login using a `js_scenario`:

This is the easiest solution among the three, as it mimics the behavior of a normal user. We first visit the login page, input our login credentials, and click on the login button.

How to log in to a website using ScrapingBee with Python

Mon, 01 Jan 0001 00:00:00 +0000

In this tutorial, we will see how you can log in to any website, using three different methods:

A JavaScript scenario.
A POST request.
Cookies

As an example, we’re going to log into this demo website and take a screenshot of the account page. So make sure to create an account there before you start!

1. Login using a `js_scenario`:

This is the easiest solution among the three, as it mimics the behavior of a normal user. We first visit the login page, input our login credentials, and click on the login button.

How to make screenshots in C#

Mon, 01 Jan 0001 00:00:00 +0000

Taking a screenshot of your website is very straightforward using ScrapingBee. You can either take a screenshot of the visible portion of the page, the whole page, or an element of the page.

That can be done by specifying one of these parameters with your request:

screenshot to true or false.
screenshot_full_page to true or false.
screenshot_selector to the CSS selector of the element.

In this tutorial, we will see how to take a screenshot of ScrapingBee’s blog using the three methods.

How to make screenshots in Go

Mon, 01 Jan 0001 00:00:00 +0000

Taking a screenshot of your website is very straightforward using ScrapingBee. You can either take a screenshot of the visible portion of the page, the whole page, or an element of the page.

That can be done by specifying one of these parameters with your request:

screenshot to true or false.
screenshot_full_page to true or false.
screenshot_selector to the CSS selector of the element.

In this tutorial, we will see how to take a screenshot of ScrapingBee’s blog using the three methods.

How to make screenshots in NodeJS

Mon, 01 Jan 0001 00:00:00 +0000

Taking a screenshot of your website is very straightforward using ScrapingBee. You can either take a screenshot of the visible portion of the page, the whole page, or an element of the page.

That can be done by specifying one of these parameters with your request:

screenshot to true or false.
screenshot_full_page to true or false.
screenshot_selector to the CSS selector of the element.

In this tutorial, we will see how to take a screenshot of ScrapingBee’s blog using the three methods.

How to make screenshots in PHP

Mon, 01 Jan 0001 00:00:00 +0000

Taking a screenshot of your website is very straightforward using ScrapingBee. You can either take a screenshot of the visible portion of the page, the whole page, or an element of the page.

That can be done by specifying one of these parameters with your request:

screenshot to true or false.
screenshot_full_page to true or false.
screenshot_selector to the CSS selector of the element.

In this tutorial, we will see how to take a screenshot of ScrapingBee’s blog using the three methods.

How to make screenshots in Python

Mon, 01 Jan 0001 00:00:00 +0000

Taking a screenshot of your website is very straightforward using ScrapingBee. You can either take a screenshot of the visible portion of the page, the whole page, or an element of the page.

That can be done by specifying one of these parameters with your request:

screenshot to true or false.
screenshot_full_page to true or false.
screenshot_selector to the CSS selector of the element.

In this tutorial, we will see how to take a screenshot of ScrapingBee’s blog using the three methods.

How to make screenshots in Ruby

Mon, 01 Jan 0001 00:00:00 +0000

Taking a screenshot of your website is very straightforward using ScrapingBee. You can either take a screenshot of the visible portion of the page, the whole page, or an element of the page.

That can be done by specifying one of these parameters with your request:

screenshot to true or false.
screenshot_full_page to true or false.
screenshot_selector to the CSS selector of the element.

In this tutorial, we will see how to take a screenshot of ScrapingBee’s blog using the three methods.

Idealista Scraper API Tool - Free Credits & Easy Setup

Mon, 01 Jan 0001 00:00:00 +0000

Images Results Scraper API - Simplified & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Imdb Scraper API - Easy Signup & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Investopedia Scraper API - Simple Signup & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

JavaScript Scenario

Mon, 01 Jan 0001 00:00:00 +0000

Basic usage

If you want to interact with pages you want to scrape before we return your the HTML you can add JavaScript scenario to your API call.

For example, if you wish to click on a button, you will need to use this scenario.

JavaScript Web Scraping API

Mon, 01 Jan 0001 00:00:00 +0000

JavaScript Web Scraping API

Web scraping using JavaScript has never been more simple. Need to scroll, click, fill inputs or else? - We've got you covered.

Kayak Scraper API - Free Credits with Simple Signup

Mon, 01 Jan 0001 00:00:00 +0000

Kickstarter Scraper API - Simple Use & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Kiwi.Com Scraper API - Easy Start & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Lazada Data Scraper API Tool - Get Free Credits Upon Sign Up

Mon, 01 Jan 0001 00:00:00 +0000

Legal Notices

Mon, 01 Jan 0001 00:00:00 +0000

1. Website Publisher

VostokInc, a joint-stock company (“société par actions simplifiée”) with registered address located at 66 Avenue des Champs Élysées – 75008 Paris and registered before the Company House of Paris under number 843 352 683, is the publisher of the website https://www.scrapingbee.com/ (the “Website”).

Email: contact@scrapingbee.com

The Publishing Director is Kevin SAHIN as legal representative of VostokInc.

2. Hosting provider

The website is hosted with NETLIFY:
Address:
Netlify Inc.
512 2nd Street Fl 2
San Francisco CA 94107
USA
Contact information: fraud@netlify.com

Local Results Scraper API - Simplified Access & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

LoopNet Scraper API - Simple Signup, Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Lowes Scraper API - Simplified Access, Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Luminati alternative for web scraping?

Mon, 01 Jan 0001 00:00:00 +0000

Luminati alternative for web scraping?

Stop paying exorbitant fees for web scraping. Get all the data you need at a drastically better price.

Try ScrapingBee for Free

based on 100+ reviews.

Powerful proxies, without the ridiculous price tag.

ScrapingBee starts at only $29/mo compared to Luminati's outragous prices, see for yourself! And you always know what you’re going to pay. No surprises!

Marketplace Scraper with Free Credits - Easy Setup and Use

Mon, 01 Jan 0001 00:00:00 +0000

Mediamarkt Scraper API - Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Mediamarkt Scraper API - Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Medium Scraper API - Effortless Signup, Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Meesho Scraper API - Easy Access & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Mercadolibre Scraper API - Free Credits on Signup

Mon, 01 Jan 0001 00:00:00 +0000

Mercari Scraper API - Simple Signup, Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

MLS Scraper with Free Credits - Easy-to-Use Data Extraction

Mon, 01 Jan 0001 00:00:00 +0000

Monster Scraper API - Free Signup, Credits Included

Mon, 01 Jan 0001 00:00:00 +0000

Naver Images Scraper API - Get Free Credits Now

Mon, 01 Jan 0001 00:00:00 +0000

Naver Search Results Scraper API - Simple Signup, Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Netflix Scraper API - Free Starting Credits Upon Sign Up

Mon, 01 Jan 0001 00:00:00 +0000

Netnut alternative for web scraping?

Mon, 01 Jan 0001 00:00:00 +0000

Netnut alternative for web scraping?

ScrapingBee is a better alternative to Netnut. Avoid paying exorbitant rates for your web scraping.

Try ScrapingBee for Free

based on 100+ reviews.

Strong proxies, without the ludicrous price tag.

Compared to Netnut's outrageous rates, ScrapingBee begins at only $29/mo, see for yourself! And you will always know what you'll pay for. No surprises whatsoever!

Newegg Scraper API - Free Credits on Signup

Mon, 01 Jan 0001 00:00:00 +0000

News Results Scraper API - Simplicity & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Nextdoor Scraper API - Easy Signup & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Nike Scraper API - Simple Access & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Nimble alternative for web scraping?

Mon, 01 Jan 0001 00:00:00 +0000

Nimble alternative for web scraping?

ScrapingBee is a better alternative to Nimble. When simplicity, performance, and cost matter—some tools just don’t stack up.

Try ScrapingBee for Free

based on 100+ reviews.

More than contacts. Scrape anything.

Nimble is great if you only want lead data. But if you need broader scraping—products, listings, news—Nimble won’t cut it.

No Code Web Scraper - Make Integration

Mon, 01 Jan 0001 00:00:00 +0000

No Code Web Scraper - Make Integration

Enjoy no code web scraping with ScrapingBee. Integrate with most of your mainstream tools.

No Code Web Scraper - n8n Integration

Mon, 01 Jan 0001 00:00:00 +0000

No Code Web Scraper - n8n Integration

Enjoy no code web scraping with ScrapingBee. Integrate with n8n to automate your workflows.

Octoparse alternative for web scraping?

Mon, 01 Jan 0001 00:00:00 +0000

Octoparse alternative for web scraping?

ScrapingBee is a better alternative to Octoparse. If your current scraping solution feels limited or overpriced, it might be time for a change.

Try ScrapingBee for Free

based on 100+ reviews.

Outgrown point-and-click tools? You're not alone.

Visual scrapers like Octoparse are great for beginners—but quickly become painful to scale. If you're tired of GUIs, ScrapingBee gives you API-first power and flexibility.

OLX Scraper API - Easy Signup, Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Onthemarket Scraper API - Simple Start & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

OpenAI Scraper API - Signup for Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Organic Search Results Scraper API - Sign Up for Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Otodom Scraper API - Simple Use & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Oxylabs alternative for web scraping?

Mon, 01 Jan 0001 00:00:00 +0000

Oxylabs alternative for web scraping?

ScrapingBee is a better alternative to Oxylabs. Looking for more reliable, affordable, and scalable scraping solutions without the complexity?

Try ScrapingBee for Free

based on 100+ reviews.

Not just proxies. Not just high costs.

Oxylabs offers great proxy services—but at a premium. Why pay more for limited proxy pools when you can access the full web for less?

ParseHub alternative for web scraping?

Mon, 01 Jan 0001 00:00:00 +0000

ParseHub alternative for web scraping?

ScrapingBee is a better alternative to ParseHub. If you're seeking a more user-friendly interface, better pricing, and increased functionality, it may be time to explore other options.

Try ScrapingBee for Free

based on 100+ reviews.

No GUI. No complexity. Just powerful APIs.

ParseHub is perfect for beginners, but if you need something scalable and flexible, you need an API-first approach with powerful customization.

Patreon Scraper API Tool - Get Free Credits on Sign Up

Mon, 01 Jan 0001 00:00:00 +0000

Pitchbook Scraper API - Easy Start & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Pricing - ScrapingBee Web Scraping API

Mon, 01 Jan 0001 00:00:00 +0000

Privacy Policy

Mon, 01 Jan 0001 00:00:00 +0000

1. Preamble

The purpose of this privacy policy (the “Privacy Policy”) is to inform future prospects, customers, consultants, partners, service providers, and suppliers (including their employees) and more generally anyone browsing VostokInc's website at the following address https://www.scrapingbee.com/ (the “Website”) and use VostokInc’s services about how VostokInc, a joint-stock company (“société par actions simplifiée”) with registered address located at 66 Avenue des Champs Élysées – 75008 Paris – France and registered before the Company House of Paris under number 843 352 683 (“VostokInc” or “We”) processes Personal Data in its capacity as data controllers and their rights in this respect.

Product Hunt Scraper API - Easy Use & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Properati Scraper API - Easy Signup & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Proxy Mode

Mon, 01 Jan 0001 00:00:00 +0000

What is the proxy mode?

ScrapingBee also offers a proxy front-end to the API. This can make integration with third-party tools easier. The Proxy mode only changes the way you access ScrapingBee. The ScrapingBee API will then handle requests just like any standard request.

Request cost, return code and default parameters will be the same as a standard no-proxy request.

We recommend disabling Javascript rendering in proxy mode, which is enabled by default. The following credentials and configurations are used to access the proxy mode:

PubMed Scraper API - Signup for Credits Free

Mon, 01 Jan 0001 00:00:00 +0000

Quora Scraper API - Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Rakuten Scraper API - Simple Start & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Rebranding

Mon, 01 Jan 0001 00:00:00 +0000

It's not a big change so you might wonder what's wrong with the Ninja?

Why did we fall in love with the Bee 🐝?

First, our company is based in France.

We have strong legislation regarding trademark and domain name usage.

Before launching ScrapingNinja we brainstormed a lot of different names, look at the different available domain names, and checked in different databases like https://www.inpi.fr/fr and other European brand databases to make sure our domain/brand was unique.

Redfin Scraper API Tool - Simple Setup & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Redirecting...

Mon, 01 Jan 0001 00:00:00 +0000

Review Scraper API Tool - Simple Setup & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Rightmove Scraper API - Free Credits Signup

Mon, 01 Jan 0001 00:00:00 +0000

Roblox Scraper API - Free Signup and Credits

Mon, 01 Jan 0001 00:00:00 +0000

Rotten Tomatoes Scraper API - Simple Start & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

RSS Scraper API Tool - Easy Setup & Free Starting Credits

Mon, 01 Jan 0001 00:00:00 +0000

Rumble Scraper API - Simple Use & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Scrape Google Recipes Scraper API - Get Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Scrape Google Short Videos - Sign Up for Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Scrape Nasdaq with API - Free Signup and Credits

Service	Price per GB	Minimum monthly commitment	Success Rate **	Average query duration **
ScrapingBee	$0*	$49	98%	3.14s
Luminati	$0.1	$500	95%	5.12s
Netnut	$15	$300	96%	5.13s
Proxyscrape (free)	$0	$0	45%	13.6s
Freeproxycz (free)	$0	$0	25%	12.73s

* request-based pricing, ** benchmarks available here and here, ***60 ips offer

FEATURES

Conservative pricing. Radical power.

Hassle-free web-scraping API.

Smart routing

To ensure a performance rate of 98% , our smart routing algorithms will always pick the right proxies for your needs.

Scrapingdog alternative for web scraping?

Mon, 01 Jan 0001 00:00:00 +0000

Scrapingdog alternative for web scraping?

ScrapingBee is a better alternative to Scrapingdog. Looking for an easier, cheaper, and more reliable scraping solution? There are great alternatives to Scrapingdog out there.

Try ScrapingBee for Free

based on 100+ reviews.

Not just data—structured, actionable results.

Scrapingdog does the job, but if you're after flexible, scalable scraping, we offer better solutions that fit your needs.

Screenshot API for Developers

Mon, 01 Jan 0001 00:00:00 +0000

Screenshot API for Developers

Programmatic Screenshot API for any website with just a simple click of a button call, in seconds.

Search Archive Scraper API - Free Signup, Credits Included

Mon, 01 Jan 0001 00:00:00 +0000

Sec Filings Scraper API - Easy Start & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Sephora API Scraper - Sign Up for Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

SerpApi alternative for web scraping?

Mon, 01 Jan 0001 00:00:00 +0000

SerpApi alternative for web scraping?

ScrapingBee is a better alternative to SerpApi. Web scraping shouldn't cost a fortune—or require a full-time engineer to manage.

Try ScrapingBee for Free

based on 100+ reviews.

Not just search engines. Not just inflated costs.

SerpAPI is built for one job—scraping search engines—but it comes at a steep price. Why pay more for a limited tool when you can scrape the entire web for less?

Shein Scraper API - Easy Use & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Shopee Scraper API Tool - Get Free Credits on Sign Up

Mon, 01 Jan 0001 00:00:00 +0000

Shopify Scraper API - Easy Signup & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Smartproxy alternative for web scraping?

Mon, 01 Jan 0001 00:00:00 +0000

Smartproxy alternative for web scraping?

Pay the fair price for your web scraping needs.

Try ScrapingBee for Free

based on 100+ reviews.

Fulfill your web scraping needs, at a better price

Switching from Smartproxy to ScrapingBee could save you some serious money. Especially if you are using their residential proxies.

Snapchat Scraper API - Simple Start & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Social Media Scraper API - Simple Setup & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

SoundCloud Scraper API Tool - Start with Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Spaw.co alternative for web scraping?

Mon, 01 Jan 0001 00:00:00 +0000

Spaw.co alternative for web scraping?

ScrapingBee is a better alternative to Spaw.co. When scraping needs to be fast, scalable, and hassle-free, consider alternatives that make web data extraction a breeze.

Try ScrapingBee for Free

based on 100+ reviews.

No limits, no extra charges—just pure scraping power.

Spaw.co offers scraping but limits access to certain features unless you upgrade. Get the full package with no upsells.

Spotify Scraper API - Free Credits on Signup

Mon, 01 Jan 0001 00:00:00 +0000

Steam Scraper API - Simple Start & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Stockx Scraper API - Simple Access & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

StreetEasy Scraper API - Easy Start & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Substack Scraper API - Easy Use & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Supported Countries

Mon, 01 Jan 0001 00:00:00 +0000

List of supported country codes

The following is the list of supported country codes using ISO 3166-1 format .

Use country code with the country_code parameter. Geolocation is only available when premium proxies are enabled: premium_proxy=true.

Country Name	country_code
Afghanistan	af
Albania	al
Algeria	dz
American Samoa	as
Andorra	ad
Angola	ao
Anguilla	ai
Antarctica	aq
Antigua & Barbuda	ag
Argentina	ar
Armenia	am
Aruba	aw
Australia	au
Austria	at
Azerbaijan	az
Bahama	bs
Bahrain	bh
Bangladesh	bd
Barbados	bb
Belarus	by
Belgium	be
Belize	bz
Benin	bj
Bermuda	bm
Bhutan	bt
Bolivia	bo
Bosnia and Herzegovina	ba
Botswana	bw
Bouvet Island	bv
Brazil	br
British Indian Ocean Territory	io
British Virgin Islands	vg
Brunei Darussalam	bn
Bulgaria	bg
Burkina Faso	bf
Burma (no longer exists)	bu
Burundi	bi
Cambodia	kh
Cameroon	cm
Canada	ca
Cape Verde	cv
Cayman Islands	ky
Central African Republic	cf
Chad	td
Chile	cl
China	cn
Christmas Island	cx
Cocos (Keeling) Islands	cc
Colombia	co
Comoros	km
Congo	cg
Cook Iislands	ck
Costa Rica	cr
Croatia	hr
Cuba	cu
Cyprus	cy
Czech Republic	cz
Czechoslovakia (no longer exists)	cs
Côte D'ivoire (Ivory Coast)	ci
Democratic Yemen (no longer exists)	yd
Denmark	dk
Djibouti	dj
Dominica	dm
Dominican Republic	do
East Timor	tp
Ecuador	ec
Egypt	eg
El Salvador	sv
Equatorial Guinea	gq
Eritrea	er
Estonia	ee
Ethiopia	et
Falkland Islands (Malvinas)	fk
Faroe Islands	fo
Fiji	fj
Finland	fi
France	fr
French Guiana	gf
French Polynesia	pf
French Southern Territories	tf
Gabon	ga
Gambia	gm
Georgia	ge
German Democratic Republic (no longer exists)	dd
Germany	de
Ghana	gh
Gibraltar	gi
Greece	gr
Greenland	gl
Grenada	gd
Guadeloupe	gp
Guam	gu
Guatemala	gt
Guinea	gn
Guinea-Bissau	gw
Guyana	gy
Haiti	ht
Heard & McDonald Islands	hm
Honduras	hn
Hong Kong	hk
Hungary	hu
Iceland	is
India	in
Indonesia	id
Iraq	iq
Ireland	ie
Islamic Republic of Iran	ir
Israel	il
Italy	it
Jamaica	jm
Japan	jp
Jordan	jo
Kazakhstan	kz
Kenya	ke
Kiribati	ki
Korea, Democratic People's Republic of	kp
Korea, Republic of	kr
Kuwait	kw
Kyrgyzstan	kg
Lao People's Democratic Republic	la
Latvia	lv
Lebanon	lb
Lesotho	ls
Liberia	lr
Libyan Arab Jamahiriya	ly
Liechtenstein	li
Lithuania	lt
Luxembourg	lu
Macau	mo
Madagascar	mg
Malawi	mw
Malaysia	my
Maldives	mv
Mali	ml
Malta	mt
Marshall Islands	mh
Martinique	mq
Mauritania	mr
Mauritius	mu
Mayotte	yt
Mexico	mx
Micronesia	fm
Moldova, Republic of	md
Monaco	mc
Mongolia	mn
Monserrat	ms
Morocco	ma
Mozambique	mz
Myanmar	mm
Namibia	na
Nauru	nr
Nepal	np
Netherlands Antilles	an
Netherlands	nl
Neutral Zone (no longer exists)	nt
New Caledonia	nc
New Zealand	nz
Nicaragua	ni
Niger	ne
Nigeria	ng
Niue	nu
Norfolk Island	nf
Northern Mariana Islands	mp
Norway	no
Oman	om
Pakistan	pk
Palau	pw
Panama	pa
Papua New Guinea	pg
Paraguay	py
Peru	pe
Philippines	ph
Pitcairn	pn
Poland	pl
Portugal	pt
Puerto Rico	pr
Qatar	qa
Romania	ro
Russian Federation	ru
Rwanda	rw
Réunion	re
Saint Lucia	lc
Samoa	ws
San Marino	sm
Sao Tome & Principe	st
Saudi Arabia	sa
Senegal	sn
Seychelles	sc
Sierra Leone	sl
Singapore	sg
Slovakia	sk
Slovenia	si
Solomon Islands	sb
Somalia	so
South Africa	za
South Georgia and the South Sandwich Islands	gs
Spain	es
Sri Lanka	lk
St. Helena	sh
St. Kitts and Nevis	kn
St. Pierre & Miquelon	pm
St. Vincent & the Grenadines	vc
Sudan	sd
Suriname	sr
Svalbard & Jan Mayen Islands	sj
Swaziland	sz
Sweden	se
Switzerland	ch
Syrian Arab Republic	sy
Taiwan, Province of China	tw
Tajikistan	tj
Tanzania, United Republic of	tz
Thailand	th
Togo	tg
Tokelau	tk
Tonga	to
Trinidad & Tobago	tt
Tunisia	tn
Turkey	tr
Turkmenistan	tm
Turks & Caicos Islands	tc
Tuvalu	tv
Uganda	ug
Ukraine	ua
Union of Soviet Socialist Republics (no longer exists)	su
United Arab Emirates	ae
United Kingdom (Great Britain)	gb
United States Minor Outlying Islands	um
United States Virgin Islands	vi
United States	us
Uruguay	uy
Uzbekistan	uz
Vanuatu	vu
Vatican City State (Holy See)	va
Venezuela	ve
Viet Nam	vn
Wallis & Futuna Islands	wf
Western Sahara	eh
Yemen	ye
Yugoslavia	yu
Zaire	zr
Zambia	zm
Zimbabwe	zw

Suumo Scraper API - Easy Access & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Taobao Scraper - Simple Solution, Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Target Scraper API - Free Credits with Signup

Mon, 01 Jan 0001 00:00:00 +0000

Thank you for your Submission!

Mon, 01 Jan 0001 00:00:00 +0000

The Best Scraper API to Avoid Getting Blocked

Mon, 01 Jan 0001 00:00:00 +0000

The Best Scraper API to Avoid Getting Blocked

Mon, 01 Jan 0001 00:00:00 +0000

The easiest way to make the web LLM-readable

Mon, 01 Jan 0001 00:00:00 +0000

The easiest way to make the web LLM-readable

Get Markdown or Plain Text content from any website you want to scrape.

The journey to a $1 million ARR SaaS without traditional VCs

Mon, 01 Jan 0001 00:00:00 +0000

The early days

SEP 2006

14 years ago.

We (Kevin and Pierre) met in high school in a small town located in the south of France.

JUN 2010

... school ends.

We go learn CS at university. During that time, we started to learn about YC, IndieHackers, Rob Walling's book, the family, and this whole startup/bootstrapping ecosystem.

The Web Scraping API for Busy Developers

Mon, 01 Jan 0001 00:00:00 +0000

The Web Scraping API for Busy Developers

Mon, 01 Jan 0001 00:00:00 +0000

The Web Scraping API for Buzzzy Developers

Mon, 01 Jan 0001 00:00:00 +0000

Tiktok Email Scraper API - Simple Use & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

TikTok Scraper - Easy Access & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Tokopedia Scraper API - Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Transfermarkt Scraper API - Simple Start with Free Signup

Mon, 01 Jan 0001 00:00:00 +0000

Trendyol Scraper API - Free Credits with Signup

Mon, 01 Jan 0001 00:00:00 +0000

Trulia Scraper API - Free Signup with Credits

Mon, 01 Jan 0001 00:00:00 +0000

Tumblr Scraper API - Free Signup Credits & Simplicity

Mon, 01 Jan 0001 00:00:00 +0000

Udemy Scraper API - Simple Start & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Unsplash Scraper API - Simple Access & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Upwork Scraper - Quick Integration, Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Viator Scraper API - Simple Use & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Wall Street Journal Scraper API - Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Walmart API

Mon, 01 Jan 0001 00:00:00 +0000

Our Walmart API allows you to scrape Walmart search results and product details in realtime.

We provide two endpoints:

Search endpoint (/api/v1/walmart/search) - Fetch Walmart search results
Product endpoint (/api/v1/walmart/product) - Fetch structured Walmart product details

Walmart Search API

Quick start

To scrape Walmart search results, you only need two things:

your API key, available here
a search query (learn more about search query)

Then, simply do this.

Copy

curl "https://app.scrapingbee.com/api/v1/walmart/search?api_key=YOUR-API-KEY&query=iphone"

# Install the Python Requests library:
# `pip install requests`
import requests

def send_request():
 response = requests.get(
 url="https://app.scrapingbee.com/api/v1/walmart/search",
 params={
 "api_key": "YOUR-API-KEY",
 "query": "iphone",
 },

 )
 print('Response HTTP Status Code: ', response.status_code)
 print('Response HTTP Response Body: ', response.content)
send_request()

// request Axios
const axios = require('axios');

axios.get('https://app.scrapingbee.com/api/v1/walmart/search', {
 params: {
 'api_key': 'YOUR-API-KEY',
 'query': 'iphone',
 }
}).then(function (response) {
 // handle success
 console.log(response);
})

import java.io.IOException;
import org.apache.http.client.fluent.*;

public class SendRequest
{
 public static void main(String[] args) {
 sendRequest();
 }

 private static void sendRequest() {

 // Classic (GET )

 try {

 // Create request
 Content content = Request.Get("https://app.scrapingbee.com/api/v1/walmart/search?api_key=YOUR-API-KEY&query=iphone")



 // Fetch request and return content
 .execute().returnContent();

 // Print content
 System.out.println(content);
 }
 catch (IOException e) { System.out.println(e); }
 }
}

require 'net/http'
require 'net/https'

# Classic (GET )
def send_request
 uri = URI('https://app.scrapingbee.com/api/v1/walmart/search?api_key=YOUR-API-KEY&query=iphone')

 # Create client
 http = Net::HTTP.new(uri.host, uri.port)
 http.use_ssl = true
 http.verify_mode = OpenSSL::SSL::VERIFY_PEER

 # Create Request
 req = Net::HTTP::Get.new(uri)

 # Fetch Request
 res = http.request(req)
 puts "Response HTTP Status Code: #{ res.code }"
 puts "Response HTTP Response Body: #{ res.body }"
rescue StandardError => e
 puts "HTTP Request failed (#{ e.message })"
end

send_request()

<?php

// get cURL resource
$ch = curl_init();

// set url
curl_setopt($ch, CURLOPT_URL, 'https://app.scrapingbee.com/api/v1/walmart/search?api_key=YOUR-API-KEY&query=iphone');

// set method
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');

// return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);



// send the request and save response to $response
$response = curl_exec($ch);

// stop if fails
if (!$response) {
 die('Error: "' . curl_error($ch) . '" - Code: ' . curl_errno($ch));
}

echo 'HTTP Status Code: ' . curl_getinfo($ch, CURLINFO_HTTP_CODE) . PHP_EOL;
echo 'Response Body: ' . $response . PHP_EOL;

// close curl resource to free up system resources
curl_close($ch);

?>

package main

import (
	"fmt"
	"io/ioutil"
	"net/http"
)

func sendClassic() {
	// Create client
	client := &http.Client{}

	// Create request
	req, err := http.NewRequest("GET", "https://app.scrapingbee.com/api/v1/walmart/search?api_key=YOUR-API-KEY&query=iphone", nil)


	parseFormErr := req.ParseForm()
	if parseFormErr != nil {
		fmt.Println(parseFormErr)
	}

	// Fetch Request
	resp, err := client.Do(req)

	if err != nil {
		fmt.Println("Failure : ", err)
	}

	// Read Response Body
	respBody, _ := ioutil.ReadAll(resp.Body)

	// Display Results
	fmt.Println("response Status : ", resp.Status)
	fmt.Println("response Headers : ", resp.Header)
	fmt.Println("response Body : ", string(respBody))
}

func main() {
 sendClassic()
}

Here is a breakdown of all the parameters you can use with the Walmart Search API:

Walmart API

Mon, 01 Jan 0001 00:00:00 +0000

Walmart API

Access Walmart's massive product data catalog with our reliable scraping API. Get pricing, descriptions, and product details with a single API call.

Walmart Scraping API

Mon, 01 Jan 0001 00:00:00 +0000

Walmart Scraping API

Access Walmart's massive product data catalog with our reliable scraping API. Get pricing, descriptions, and product details with a single API call.

Washington Post Scraper API - Easy Start & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Wayback Machine Scraper - Free Signup Credits Available

Mon, 01 Jan 0001 00:00:00 +0000

Wayfair Scraper API - Signup for Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Web Scraping Financial Data - Simple Sign Up Credits

Mon, 01 Jan 0001 00:00:00 +0000

Web Scraping Real Estate Data - Effortless Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Webmotors Scraper API - Free Credits on Signup

Mon, 01 Jan 0001 00:00:00 +0000

WebScraper.io alternative for web scraping?

Mon, 01 Jan 0001 00:00:00 +0000

WebScraper.io alternative for web scraping?

ScrapingBee is a better alternative to WebScraper.io. Web scraping should be intuitive and affordable. If you're facing limitations, there are better alternatives to consider.

Try ScrapingBee for Free

based on 100+ reviews.

Scraping should be flexible, not restricted by templates.

WebScraper.io offers visual scraping, but we give you API-first flexibility that scales better.

Website Image Scraper API - Free Credits Signup

Mon, 01 Jan 0001 00:00:00 +0000

Whop Scraper API - Easy Start & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Wikipedia Scraper API - Free Credits on Signup

Mon, 01 Jan 0001 00:00:00 +0000

Woocommerce Scraper API - Easy Use & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Xing Scraper API - Free Signup Credits Available

Mon, 01 Jan 0001 00:00:00 +0000

Yad2 Scraper API - Free Signup Credits Available

Mon, 01 Jan 0001 00:00:00 +0000

Yahoo Search Scraper API - Free Credits & Simplified Start

Mon, 01 Jan 0001 00:00:00 +0000

Yahoo! Ads Scraper API - Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Yahoo! Finance Scraper API - Simple Start & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Yahoo! Images Scraper API - Free Credits on Signup

Mon, 01 Jan 0001 00:00:00 +0000

Yahoo! Questions Scraper API - Get Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Yandex Reverse Image Scraper API - Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Yandex Scraper API - Free Credits with Simple Signup

Mon, 01 Jan 0001 00:00:00 +0000

Yandex Search Scraper API - Free Signup & Effortless Integration

Mon, 01 Jan 0001 00:00:00 +0000

Yellow Pages Scraper API - Simple Use & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

YouTube Ad Results Scraper API - Get Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

YouTube API

Mon, 01 Jan 0001 00:00:00 +0000

Our YouTube API allows you to scrape YouTube search results, video metadata, transcripts, and trainability information in realtime.

We provide four endpoints:

Search endpoint (/api/v1/youtube/search) - Fetch YouTube search results
Metadata endpoint (/api/v1/youtube/metadata) - Fetch structured YouTube video metadata
Transcript endpoint (/api/v1/youtube/transcript) - Fetch YouTube video transcripts
Trainability endpoint (/api/v1/youtube/trainability) - Check video transcript availability

YouTube Search API

Quick start

To scrape YouTube search results, you only need two things:

YouTube API

Mon, 01 Jan 0001 00:00:00 +0000

YouTube API

Scrape YouTube search results, video metadata, transcripts, and trainability information in real-time with structured JSON output.

Youtube Comment Scraper API - Simple Start & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Youtube Email Scraper API - Easy Start & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

YouTube Search Scraper API - Simplicity & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

YouTube Shorts Scraper API - Simple Signup Credits Free

Mon, 01 Jan 0001 00:00:00 +0000

Youtube Title Scraper API - Easy Access & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Youtube Transcript Scraper API - Easy Signup & Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Youtube Video Scraper API - Simple Use & Free Signup Credits

Mon, 01 Jan 0001 00:00:00 +0000

Zara Scraper API - Free Credits on Signup

Mon, 01 Jan 0001 00:00:00 +0000

ZenRows alternative for web scraping?

Mon, 01 Jan 0001 00:00:00 +0000

ZenRows alternative for web scraping?

ScrapingBee is a better alternative to ZenRows. Not all scraping APIs are created equal. Here's how the alternatives compare.

Try ScrapingBee for Free

based on 100+ reviews.

No locked features. No asterisks.

ZenRows offers solid scraping features, but limits access to key capabilities unless you pay more. We don't do that.

Zenscrape alternative for web scraping?

Mon, 01 Jan 0001 00:00:00 +0000

Zenscrape alternative for web scraping?

ScrapingBee is a better alternative to Zenscrape. Need scraping that’s simple, fast, and cost-effective? Check out alternatives that deliver better value and performance.

Try ScrapingBee for Free

based on 100+ reviews.

Not just scraping—smart, efficient data extraction.

Zenscrape offers scraping but limits key features unless you pay extra. We make all features available without the need for upgrades.

Zillow Scraper API - Free Signup Credits Provided

Mon, 01 Jan 0001 00:00:00 +0000

Zomato Web Scraper - Simplified Signup + Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

ZoomInfo Scraper API - Sign Up for Free Credits

Mon, 01 Jan 0001 00:00:00 +0000

Zoopla Scraper API - Free Credits on Signup

Mon, 01 Jan 0001 00:00:00 +0000

Zyte API alternative for web scraping?

Mon, 01 Jan 0001 00:00:00 +0000

Zyte API alternative for web scraping?

ScrapingBee is a better alternative to Zyte API. Your web scraping solution doesn’t have to be overpriced or complicated—there are simpler and more affordable alternatives.

Try ScrapingBee for Free

based on 100+ reviews.

No complexity, no unnecessary costs—just efficient web scraping.

Zyte API provides a lot of power but comes with added complexity and high costs. We provide all the same features with simpler, more affordable plans.