Selenium

Selenium WebDriver – Step-by-Step Tutorial

Harish Rajora

February 7, 2024

13 mins read

Summarize this blog post with:

Table of contents

The word “Selenium” today is used interchangeably with “WebDriver” and “Selenium WebDriver,” even though the initial Selenium project did not include it in its library. What started as a tool to remotely pass HTTP commands to the server is now known for its WebDriver technology that helps execute automation tests on any browser available in the market. If Selenium’s components have seen tremendous growth over the years, WebDriver has been a focus of improvement in this race, as it has become the most used technology that comes with a complete Selenium package. This makes us wonder if it is just hype or truly exceptional, which makes it a household name in the testing industry. Well, let’s find out by checking its strength theoretically as well as practically.

What is Selenium WebDriver?

A WebDriver is a generic tool that drives the web interface. However, when it is combined with Selenium and its technologies, it becomes Selenium WebDriver. Since Selenium works in the web automation department, naturally, the main goal of Selenium WebDriver adoption becomes the automated test case execution on web browsers.

It is worth noting that Selenium is a complete suite of tools that includes Selenium IDE, Selenium Grid, and Selenium WebDriver. Each tool serves a different purpose, and WebDriver is the most widely adopted component among them, specifically built for automating browser-based test execution.

For beginners, understanding what is Selenium WebDriver or what is WebDriver in Selenium is essential because it defines how Selenium interacts with different browsers.

Selenium WebDriver in Selenium uses specific browser drivers for the target browsers to execute the test cases. For example, Chrome WebDriver can be downloaded from its official website to connect with Selenium and communicate the instructions via APIs written in Selenese. This mechanism is part of the Selenium WebDriver architecture that defines how Selenium interacts with different browsers. During this phase, the tester can see the execution as Selenium WebDriver opens an instance of that browser using the driver. The WebDriver makes this process possible on the local system or on the remote system without the use of any specific server.

Selenium-tests-on-scalable-cloud-infrastructure-Selenium_CTA.webp

As a tester, one might wonder if they need to learn different methodologies while working on different browser drivers, depending on their internal working. Fortunately, this is not true. WebDriver provides cross-browser compatibility, where the tester can use the same functions across different browser drivers, making the testing process easier and faster.

The Architecture of Selenium WebDriver

When a tester executes a test through Selenium WebDriver, they are essentially triggering a series of tasks accomplished by various components that work together. These four components together build up the architecture of Selenium WebDriver.

1. Automation Code/ Client Library

Everything starts with writing automation code in a client library provided by Selenium, and as anyone associated with testing would know, this list is endless. Selenium is popular for offering support for all the major programming languages through its bindings to write the WebDriver code in. These are then wrapped around the library to create browser-specific code.

2. JSON Wire Protocol over HTTP Client

Once the test cases are constructed, the data needs to be communicated to the browser driver in some way. In Selenium WebDriver, this is achieved by following JSON (JavaScript Object Notation) Wire Protocol with HTTP as the communication handler. The actual internal work happens with RESTful Web Services, and the tester’s system works as a client. This gets connected to the HTTP server provided by the browser driver in their library.

Since the communication is established in a client-server infrastructure, it is referred to as a “request-response” system or “command-response” system in the Selenium WebDriver world.

3. Browser drivers

The actual browser we use in our daily lives does not let the end-user know its internal workings or code structure. Even if a tester wants to perform simple actions such as selecting an element through a locator, they won’t be able to do that directly. To accomplish this, an intermediary is required that can understand the intent of an automation script and perform that action on the actual browser. This intermediary is a browser driver.

A browser driver communicates with both Selenium and the browser to perform the test execution. Since the working of a browser is confidential, browser drivers are not generic and are developed and provided by the browser developers only. Therefore, for each browser, the testers may require one browser driver with the same version as the installed browser. From the tester’s end, the Selenium scripts will always target a browser driver rather than a browser.

Browser	Browser Driver	Supported OS
Google Chrome	ChromeDriver	Windows, macOS, Linux
Mozilla Firefox	GeckoDriver	Windows, macOS, Linux
Microsoft Edge (Chromium)	EdgeDriver	Windows, macOS, Linux
Safari	SafariDriver	macOS only
Opera	OperaDriver / ChromeDriver	Windows, macOS, Linux
Internet Explorer	IEDriver	Windows only
Microsoft Edge (Legacy)	EdgeDriver (Legacy)	Windows only
Brave	ChromeDriver	Windows, macOS, Linux

4. Browser

A browser, with respect to the architecture, is the actual browser we use for browsing the internet. Once the browser driver understands the JSON sent, it communicates it to these browsers over HTTP using an internal HTTP server. The response is received using the same protocol and communicated to the client, hence completing the cycle.

These four components together make up the Selenium WebDriver as we know it – fast, robust, and reliable.

Internal working of Selenium WebDriver

In this short section for a quick reference, let’s summarize the working of Selenium WebDriver end-to-end.

A tester writes an automation test script targeting a specific browser driver.
When they click the “Run” button, the script is converted to the API format with data in JSON format.
This data is then transferred to a browser driver using JSON Wire protocol over an HTTP network as a RESTful API.
The browser driver receives the data, and if validation is successful, it communicates those actions to the browser via HTTP.
If the validation is rejected, the errors are communicated back to the client.
Once the browser initializes, the driver performs the actions one by one, hence performing the testing like a manual tester through automation.
The commands are sent through HTTP, and the response is received via the same protocol by the driver.
Once all the actions are performed, the browser shuts down, and the driver communicates the results to the client.

It is important to note here that the actual browser should be installed on the local system for the browser driver to communicate.

Why should every tester learn about Selenium WebDriver?

Selenium WebDriver is used for automated testing on web browsers. However, some testers work on scriptless tools as well and do not worry much about Selenium. This could be harmful to their career, as one could only choose an option when they can compare the benefits offered by each of them and fit them into the project requirements. In this respect, Selenium WebDriver has a lot to offer:

Advantages of Selenium Webdriver

Open-source: Selenium WebDriver is open-source and therefore free to use for everyone. It also allows the testers to change the internal code as per their requirements and make the most optimal use of it.

Cross-browser and cross-platform compatible: Selenium WebDriver code can be used on multiple browsers and on multiple platforms. Hence, it enhances cross-browser testing and cross-platform compatibility.

Enhances reusability: The compatibility of the same code with various platforms enhances reusability, reduces the tester’s work, and helps deliver the project faster.

Provides Selenium integration: The Selenium suite offers multiple other tools for various needs, including Selenium WebDriver. Since it belongs to the same family, it comes pre-designed to be integrated and adjusted with all these tools, providing a strong centralized software.

Parallelism support: Selenium WebDriver comes with the support of parallel test execution on multiple browsers as well. It helps shorten the testing time and overall testing phases.

No server required: One of the best things about using Selenium WebDriver is the absence of any server to be used for communication between components. This makes the communication part lightweight and much faster.

Consumes fewer resources: Since Selenium WebDriver runs with minimal requirements and with simpler technologies, it consumes far fewer resources than its counterparts, such as QTP.

Easy to learn: It is extremely easy to learn to write test cases for Selenium WebDriver. They use relatable English-type function meanings (such as findElement), which are easier to grab and remember.

Easy to implement: A tester can get started with their Selenium WebDriver journey by just installing an IDE, Selenium WebDriver, required libraries, browser drivers, and browsers. It is easy to set up and maintain in case of updates and modifications.

Large community support: Selenium WebDriver has been in existence since 2007. It has gathered a lot of testers with a vast community. The community is there to help each tester in case of any queries, doubts, workarounds, processes, and much more. This ultimately reduces the time of implementation and hence the time of execution.

Along with so many advantages, as with every tool, Selenium WebDriver also suffers from certain demerits. While many of them can be adjusted as they do not have any major effect, others, like minimum support for mobile devices, fewer post-testing functionalities, and regular manual browser updates, can be a challenge in today’s time. These can be made to work with tools like TestGrid that can provide automatic upgrades and functionalities beyond the actual test execution, like reporting and team collaboration.

How to install and set up Selenium WebDriver?

Selenium WebDriver installation depends on the language the tester will be writing the code in. As the popular choice among testers for this task is Java, in this section, we will pick up the same language as well. This approach is often covered in many Selenium WebDriver tutorial guides and explains why Java is commonly paired with Selenium WebDriver.

To set up the Selenium WebDriver, first, we need Java and its SDK. This can be downloaded and installed from its official website, depending on the system in use.

For anyone wondering what Selenium is in Java or how to configure Selenium WebDriver Java libraries, the process starts with a proper Java installation.

The successful installation can be verified using the command prompt and typing “java –version” in it.

Now we need an IDE to write the code in. This depends on the choice of the tester. However, we recommend IntelliJ and Eclipse if Java is the preferred choice. IntelliJ can be downloaded and installed from its official website.

This too will depend on the type of platform the tester is working on.

The next step is to download the Selenium WebDriver with language bindings of our choice from the official website.

download Selenium WebDriver with language bindings of your choice

Now, the last step is to attach Selenium to the IDE project. This can be done by adding the appropriate dependency in the XML file of the Maven project:

It would be similar to this:

<!-- https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java -->
<dependency>
    <groupId>org.seleniumhq.selenium</groupId>
    <artifactId>selenium-java</artifactId>
    <version>4.17.0</version>
</dependency>

This concludes the setup and installation for Selenium WebDriver. With all the ingredients in hand, all we need now is to start writing the test cases for browser automation.

How to create a Selenium WebDriver test script?

Creating a Selenium script is quicker than we would imagine because of the direct functions that accomplish multiple tasks at once. The motive behind this section is not to initiate a demonstration but to understand the skeleton of the script. If the tester knows what part does what and where it fits, they can experiment and create highly efficient tests.

This also serves as a basic Selenium testing example for anyone starting and looking for Selenium WebDriver examples or sample Selenium code in Java.

Before writing actions, testers need to identify the elements they want to interact with on a webpage. This is done using locators. Locators help Selenium find the right element on a page so that actions like clicks, inputs, or assertions can be performed on it. The commonly used locators in Selenium WebDriver are ID, Name, Class Name, XPath, CSS Selector, and Link Text. In the code example below, By.cssSelector is used as a locator to find the submit button on the page

To accomplish testing on a browser, the testers need to start an instance of the browser driver. This object will be responsible for calling the functions to the driver.

WebDriver mydriver = new ChromeDriver();

The mydriver object is an instance of Chrome’s browser driver. Hence, it is important to note that for different drivers, the testers are required to use their own classes.

Now, we can call a web address on the same browser by passing it in the “get” function:

mydriver.get("https://www.testgrid.io");

This will open up the TestGrid website.

From this point, the tester needs to write test scripts targeting the actions he wishes to take on the page. So, for instance, we need to get the title of the page, then that can be achieved as:

mydriver.getTitle();

Save it in a variable:

String title = mydriver.getTitle();

Assert it to check if it is expected or not:

Assert.assertEquals(“AI-powered end-to-end testing”, title);

Similarly, we can find a button with its attached class (or alternatively, using XPath as well):

WebElement submit = mydriver.findElement(By.cssSelector("button"));

And then click it using “click”:

submit.click();

This is it! Writing test scripts on Selenium WebDriver and executing them on browser drivers is extremely fast once the tester has their grip on it. These Selenium WebDriver example code snippets form part of any WebDriver Selenium tutorial or Selenium WebDriver quick start guide for Java automation.

There are a lot of functions in its library that the tester must go through once before starting the scripting part. Once done, the browser driver instance should be destroyed:

mydriver.quit();

This is important from a memory perspective as well.

How to run Selenium WebDriver scripts?

Once our script is ready, we need to select one of the choices for executing these scripts on real browsers through browser drivers. For this, we have two options: either execute the scripts on the local system or in the cloud.

This section complements the previous Selenium scripts example and shows how sample Selenium code can be executed in different environments.

Running WebDriver scripts on the local system

To run Selenium WebDriver on the local system, we need an IDE, Java, JDK, JRE installed, Selenium dependencies attached, language binding installed and attached, and a lot more such installations. All this has been a part of the discussion in the previous sections. This method can be adopted, but there are two issues – scaling and maintenance.

Individuals and students can enjoy local system execution as they just need to practice testing and have smaller projects to work on. However, small businesses are always scaling up, and therefore, even if the maintenance is not difficult today, it certainly will be in the future. Similarly, for big organizations, there are so many test cases to execute that even if one fails due to an infrastructure error, it will just waste the time of the team in debugging its cause.

Also, we cannot eliminate the high maintenance overhead around cross-browser issues. For testing 20 versions of Google Chrome, we have to download and install 20 versions on the local system. If there are two more browsers to test on, the combinations become 60! This process consumes a lot of time from the testers, and since these are billable hours, we end up wasting a lot of money with all these overheads. Therefore, it is recommended to adopt a cloud-based system and focus just on the test script part.

Running WebDriver scripts on TestGrid

Cloud-based solutions do not demand additional work from the testers. All they need is test scripts and a list of target devices to work on. The infrastructure, device installation and management, and scaling issues are all handled by the cloud team. However, not all tools work equally and efficiently, making it necessary for the team to first analyze the tool according to the requirement and then proceed further. This approach works well for executing Selenium WebDriver examples in distributed environments without worrying about setup issues.

One such tool that checks off all the marks of reliability and robustness is TestGrid.

TestGrid is a cloud-based test automation tool that can perform end-to-end testing without ever asking you to install any tool on the local system. It provides real devices and can leverage the power of artificial intelligence to not only move things faster but also make intelligent decisions along the way. To get started, a quick sign-up is required so that all the test-related data is in a single place.

Once done, log in with the credentials, and you will arrive at your personal dashboard:

Select “Real Device Cloud” from the left tool panel to view the list of all devices to run the Selenium WebDriver script on.

Click on the platform icon (Android icon, for example) to open up the values of the Browser URL.

These values should be incorporated in the Selenium WebDriver script, and once the tester runs the script, it will be executed on the desired system. For reference, you can also visit the official documentation to get a deeper insight into these steps.

Once we are done, the results are reflected on the dashboard, and a report is generated for analysis. This can be shared with the team or kept in the archive for future use.

Conclusion

Browser testing was a manual job assigned to people who made fewer human mistakes. This is because it is repetitive and requires a large number of browsers, their versions, operating systems, their versions, etc. But still, there is no guarantee that a failure is a false positive or not. Also, why do we need to spend so much on something that could be done at almost 1/10th the cost?

This is where Selenium WebDriver jumps in. It does everything that used to be done manually, but without any errors! All a tester needs is test scripts, a browser driver, and a real browser on their system. It would be even better if the tester opts for cloud-based solutions like TestGrid, and they can be free of maintenance, scaling issues, and most of all downloading tons of software that will just consume the resources unnecessarily. Selenium WebDriver has proved its worth for more than a decade now. It is powerful, resilient, supports multiple languages, and is cross-browser and platform compatible. If anything, it should be one of the tools a tester regularly works on, and if they do not, they should at least know about it. With this, we hope it becomes one of the main tools in your suite and helps you accomplish testing goals efficiently.

Frequently Asked Questions

What is Selenium WebDriver used for?

Selenium WebDriver is a test automation framework used for automating browser-related test cases. It works by converting test cases written in a programming language to actions that are executed on real browsers with the help of browser drivers.

What are locators in Selenium?

Locators are used to locate elements on a webpage so that Selenium WebDriver can execute actions on them. They can be unique or include multiple elements depending on the tester’s intent. Knowledge of HTML is required to use locators in testing.

What is the difference between Selenium and Selenium WebDriver?

Selenium is a suite of tools, and WebDriver is one specific tool for browser automation.

Do I need to install browser drivers manually in Selenium?

Previously, browser drivers such as ChromeDriver and GeckoDriver had to be downloaded and configured manually. However, starting from Selenium 4.6.0, Selenium Manager handles this automatically, making the setup process much simpler for testers.

Can Selenium WebDriver work without a server?

Yes. Unlike its predecessor, Selenium RC, Selenium WebDriver does not require an intermediate server to communicate with the browser. It sends commands directly to the browser driver, which makes test execution faster and the overall setup simpler.

Harish Rajora

Harish is a senior software engineer who likes to contribute to the world by sharing knowledge through his writings. He loves reading books and implementing ideas into reality through a computer.