Read our blogs, tips and tutorials
Try our exercises or test your skills
Watch our tutorial videos or shorts
Take a self-paced course
Read our recent newsletters
License our courseware
Book expert consultancy
Buy our publications
Get help in using our site
551 attributed reviews in the last 3 years
Refreshingly small course sizes
Outstandingly good courseware
Whizzy online classrooms
Wise Owl trainers only (no freelancers)
Almost no cancellations
We have genuine integrity
We invoice after training
Review 30+ years of Wise Owl
View our top 100 clients
Search our website
We also send out useful tips in a monthly email newsletter ...
Posted by Andrew Gould on 15 January 2021
In this video you'll learn how to perform basic web scraping tasks using Selenium Basic and Google Chrome. You'll see how to open Chrome, navigate to a web page and interact with items on the page including typing text into an input box and clicking a button. You'll learn how to check if a web element exists on the page to avoid run time errors and how to set a timeout value. You'll see how to reference a collection of elements and loop over them to identify the ones you're interested in. You'll also learn how to write a table into an Excel worksheet using two different techniques, as well as how to extract hyperlinks from a web table and assign them to cells in an Excel worksheet.
See our full range of VBA training resources, or test your knowledge of VBA with one of our VBA skills assessment tests.
This video has the following accompanying files:
File name | Type | Description |
---|---|---|
Pt57_2 Basic Web Scraping with Selenium and Google Chrome.xlsm | Excel workbook with macros |
Click to download a zipped copy of the above files.
There are no exercises for this video.
You can increase the size of your video to make it fill the screen like this:
Play your video (the icons shown won't appear until you do), then click on the full screen icon which appears as shown above.
When you've finished viewing a video in full screen mode, just press the Esc key to return to normal view.
To improve the quality of a video, first click on the Settings icon:
Make sure yoiu're playing your video so that the icons shown above appear, then click on this gear icon.
Choose to change the video quality:
Click as above to change your video quality.
The higher the number you choose, the better will be your video quality (but the slower the connection speed):
Don't choose the HD option shown unless your connection speed is fast enough to support it!
Is your Wise Owl speaking too slowly (or too quickly)? You can also use the Settings menu above to change your playback speed.
From: | GBurke |
When: | 16 Nov 22 at 00:15 |
Love your videos. Any chance you can add a new video to your resources for scraping an e-commerce site. They tend to not have tables, some have a pop-up discount offer on 1st orders, some require you to confirm your location, or select a Store. Some need you to "load more" (which I think I can now I have watched this video), or select the next page (havent tried that, but think I can based on your video).
From: | Andrew G |
When: | 17 Nov 22 at 08:29 |
Hi! Happy to hear the videos have helped you so far. I don't have plans to create a video specific to e-commerce sites but all of the techniques you need should be covered in some form in the other videos in this category (I'm sure one of them even includes some examples using Amazon). Video 57.4 is probably the most useful - learning how to reference elements and loop through collections of elements will be the most useful skills you'll learn with web-scraping.
I hope it helps!
From: | J.A. Clark |
When: | 14 Jan 22 at 17:47 |
Enjoyed video on web scraping using Selenium and Google Chrome. The Selenium library is a must with most sites no longer compatible with IE so this video was very helpful :-)
I have a two simple question s(at least to begin with):
Is there a way to run the script without the Google Chrome running in the foregrand (or maybe run the Chrome instance minimized and then exit after webscraping completed)?
How can the scraped data be placed below the search screen/button (no new sheet each time search is performed)?
Any assistance would be appreciated.
------------
From: | Andrew G |
When: | 15 Jan 22 at 07:51 |
Hi!
You can open a hidden Chrome window as described here. If you want to close Chrome automatically at the end, declare the ChromeDriver variable within the subroutine - this will close Chrome automatically when the variable goes out of scope at the end of the procedure. If you want to close the browser window explicitly you can apply the Close method to the ChromeDriver variable.
If you want to add new search results to the end of an existing list you can refer to the techniques described here.
If you want to replace previous search results with the new ones you can delete the existing results by applying the Clear method to the range of cells. If you need help with basic cell selection you can refer to the techniques described here.
I hope it helps!
From: | J.A. Clark |
When: | 20 Jan 22 at 21:19 |
Thank you for the response.
Added the headless argument and everything works good.
Just in case I ever need to run Chrome to download a file, what is the Selenium VBA command to run Chrome minimized? The referenced video mentions doing this but no details on what the command would be.
From: | Andrew G |
When: | 21 Jan 22 at 08:26 |
Hi!
Minimizing is a little trickier than maximizing! I usually cheat and just reposition the window off the visible screen using code similar to this:
cd.Window.SetPosition X:=0, Y:=-cd.Window.Size.Height
I hope it helps!
From: | Swaroop |
When: | 14 Jan 22 at 15:39 |
Pls help, i am getting error while running Element not found for name
<input type="text" id="_id2:logon:USERNAME" name="_id2:logon:USERNAME" value="SDN004">
Option Explicit
Private ch As Selenium.ChromeDriver
Sub test()
Set ch = New Selenium.ChromeDriver
ch.Start baseUrl:="http://sapbi.crb.apmoller.net/BOE/CMC"
ch.Get "/"
ch.FindElementByName("_id2:logon:USERNAME").SendKeys "SDN004"
End Sub
From: | Andrew G |
When: | 15 Jan 22 at 07:59 |
Hi!
I don't have access to the page you're using so I can't offer precise help but if finding the element by name isn't working you could try some of the other search strategies described here.
I hope it helps!
From: | Swaroop |
When: | 15 Jan 22 at 14:18 |
Hi Andrew,
Due to iframe i was not able find elements, so i am now able to login to 1st page , when it goes to 2nd page i am again getiing no element error tried same with SwitchToFrame as 1st page but getting error, can see there are multiple iframes how can we solve this, do you have any Vba code to capture how many iframe and elments it has in a page.
From: | Andrew G |
When: | 17 Jan 22 at 12:51 |
Hi! No, sorry we don't have any videos on iframes.
You should be able to use the FindElementsByTag method to return all of the frame elements and you can use the Count property of the collection returned to discover how many frames there are.
You can use the index number of a frame to switch to it, for example:
cd.SwitchToFrame identifier:=0
And that's as much as I know about iframes!
From: | Orejon |
When: | 09 Dec 21 at 21:34 |
Andrew, first let me thank you for your Videos and your easy going persona, which is the embodiment of your websiet (WiseOwl).
I've learned how to scrape websites using selemium basic from you, with a great deal of success. I've run into a bit of a problem which I do not know how to solve, and I've agonized over coming here for help in order not to bother you. But I am out of ideas after a few weeks os researching. So I hope you find the time.
I am simply taking a table from the website and sending it to excel by using (NomPrpVoteTbl.AsTable.ToExcel DS.Cells(lRow, 1).
However I am getting the following error:
Run-time error '-2147417856 (80010100)' System Call Failed. (Exception from HRESULT: 0x80010100 (RPC_E_SYS_CALL_FAILED))
This is in a loop and I am executing the "asTable.toExcel" a few dozen times. If I limit the amount of times in the loop the error does not happen.
Any help, or pointers would greatly be appreciated.
From: | Andrew G |
When: | 10 Dec 21 at 08:28 |
Hello, I appreciate the restraint in requesting help!
Sadly, I'm afraid that I don't have any insight into why this is happening or know of a solution to the problem. Is it possible to break your procedure into multiple smaller ones to reduce the number of tables you're processing each time?
From: | Orejon |
When: | 10 Dec 21 at 15:28 |
Andrew, thanks for the reply.
I feel like such an idiot for bothering you, especially since I just discovered (accidentally) that this is a "threading" issue with windows. I am calling this process from within another macro workbook, if I call it from within its itself the error does not occur, no matter how many iterations. Now all I have to do is figure out how threading works on windows and how to manipulate within VBA I am all set :(.
Again, thanks for the quick reply and have a wonderful holiday season.
From: | Andrew G |
When: | 11 Dec 21 at 08:46 |
Interesting! And I don't think that I would kick myself for not recognising what is a fairly esoteric problem!
As I understand it there's no native way for VBA to control threading. This might help to give you a primer on the topic https://analystcave.com/excel-multithreading-vba-vs-vbscript-vs-c-net/
I hope it helps!
From: | prtm1988 |
When: | 08 Jun 21 at 12:28 |
Thanks for sharing such great informative training videos. Even toughest of the tough topic are made easy to understand. Even a person without any coding knowledge can digest your topics and become an expert..
Im getting a
Run-time error '1004': Exception from HRESULT: 0x800A03EC
Table.AsTable.ToExcel ThisWorkbook.Worksheets("DirStatus").Cells(2, 10)
Suprisingly the same code was running fine for 30-35 days, however offlate from past two days without any changes being done to the code it authomatically started throwing up this error. If possible kindly give your thoughts on the same..
Thank again for sharing all your training vidoes.
From: | Andrew G |
When: | 08 Jun 21 at 13:40 |
Hi there!
I don't know the answer to that one but here are some suggested causes and fixes:
https://www.codeproject.com/Questions/470089/Exception-from-HRESULT-0x800A03EC-Error
https://stackoverflow.com/questions/7099770/hresult-0x800a03ec-on-worksheet-range
I hope it helps!
From: | Stennettc |
When: | 28 May 21 at 01:32 |
Hi Andrew,
Subsequent to my earier post, i made some adjustments. I am still stuck at how to get the output data into one results sheet in the pivot table format. Your assistance would be greatly appreciated. The sample input data is valid for this code.
Thanks much!
Option Explicit
Private chromebrowser As Selenium.ChromeDriver ' declaration at the module level. note we changed DIM to Private
Sub LoopScrapePropTaxWebsiteNew()
Dim FindBy As New Selenium.By ' declaration of an auto instancing BY object to query any changes to HTML class name
Dim PropertyOwnerQueryResults As Selenium.WebElements ' Declaration of a variable that can take multiple web elements
Dim PropertyTaxResults As Selenium.WebElements ' Declaration of a variable that can take only one web element
Dim PropertyOwnerQueryResult As Selenium.WebElement
Dim PropertyTaxResult As Selenium.WebElement
Dim PropInfoTable As Selenium.WebElement
Dim PropTaxtable As Selenium.WebElement
Dim ValNo As String
Dim StratNo As String
Set chromebrowser = New Selenium.ChromeDriver
chromebrowser.Start 'The base URL in google chrome
chromebrowser.Get "https://ptsqueryonline.fsl.org.jm/PTSOnlineWeb/ptsquery.jsp" ' access the home page. If we wanted to access any page, we would just indicate the page name after the forward slash
Application.ScreenUpdating = False 'This line will turn off the screen flickering
Worksheets("PropTaxData").Activate ' Go back to the property Tax Sheet
Range("A2").Select
'-----------Loop to download the Property Tax Information
Do Until ActiveCell.Value = ""
ValNo = ActiveCell.Value
StratNo = ActiveCell.Offset(0, 1).Value
If chromebrowser.IsElementPresent(FindBy.Name("valuationno"), 5000) = False Then ' this is to account for any changes in class name as time passes. the number is the timeout period in millisecond, ie 3000 milliseconds is 3 sec
chromebrowser.Quit ' code to exit the chrome browser
MsgBox "Could not find search input box", vbExclamation ' Code to diplay message box with error
Exit Sub ' Exit sub routine
End If ' end if statement
chromebrowser.FindElementByName("valuationno").SendKeys ValNo
If chromebrowser.IsElementPresent(FindBy.Name("stratalotno"), 5000) = False Then ' this is to account for any changes in class name as time passes. the number is the timeout period in millisecond, ie 3000 milliseconds is 3 sec
chromebrowser.Quit ' code to exit the chrome browser
MsgBox "Could not find search input box", vbExclamation ' Code to diplay message box with error
Exit Sub ' Exit sub routine
End If ' end if statement
chromebrowser.FindElementByName("stratalotno").SendKeys StratNo
If chromebrowser.IsElementPresent(FindBy.Class("btn"), 5000) = False Then ' this is to account for any changes button name as time passes. the number is the timeout period in millisecond, ie 3000 milliseconds is 3 sec
chromebrowser.Quit ' code to exit the chrome browser
MsgBox "Could not find submit button", vbExclamation ' Code to diplay message box with error
Exit Sub ' Exit sub routine
End If ' end if statement
chromebrowser.FindElementByClass("btn", 3000).Click
'---------------Retriving the results of the property tax query
Set PropertyOwnerQueryResults = chromebrowser.FindElementsByClass("form-horizontal")
If PropertyOwnerQueryResults.Count = 0 Then ' Condition to account for an empty result set
'chromebrowser.Quit
MsgBox "No Results Found" & " " & "for Valuation No" & " " & ValNo, vbExclamation
Exit Sub
End If
For Each PropertyOwnerQueryResult In PropertyOwnerQueryResults ' loops through the web elements
Set PropInfoTable = PropertyOwnerQueryResult.FindElementByTag("tbody")
'Debug.Print PropInfoTable.Text ' print the webeleemnt texts to the immediate window
Next PropertyOwnerQueryResult
ProcessTable PropInfoTable
chromebrowser.FindElementById("back").Click 'Select the back button
Worksheets("PropTaxData").Activate
ActiveCell.Offset(1, 0).Select ' Go to the next cell
Loop
'Application.ScreenUpdating = True
End Sub
Sub ProcessTable(TableToProcess As Selenium.WebElement)
Dim AllRows As Selenium.WebElements
Dim SingleRow As Selenium.WebElement
Dim AllRowCells As Selenium.WebElements
Dim SingleCell As Selenium.WebElement
Dim OutputSheet As Worksheet
Dim RowNum As Long, ColNum As Long
Dim TargetCell As Range
Set OutputSheet = ThisWorkbook.Worksheets("Results")
Set AllRows = TableToProcess.FindElementsByTag("tr")
For Each SingleRow In AllRows
RowNum = RowNum + 1
Set AllRowCells = SingleRow.FindElementsByTag("td")
If AllRowCells.Count = 0 Then
Set AllRowCells = SingleRow.FindElementsByTag("th")
End If
For Each SingleCell In AllRowCells
ColNum = ColNum + 1
Set TargetCell = OutputSheet.Cells(RowNum, ColNum)
TargetCell.Value = SingleCell.Text
'Debug.Print SingleCell.Text
Next SingleCell
ColNum = 0
Next SingleRow
End Sub
From: | Stennettc |
When: | 27 May 21 at 21:55 |
Hi Andrew,
I am loving your content. The videos are insightful, simple and easy to understand. Using Selenium, I am writing a webscraping script to loop over an excel array of values to login a website, copy the particulars from that website and write the contents to an excel sheet. The data is returned across as a table. I know the script works as the data is generated in the immediate window. I need your guidance how to write the details to an excel file in a pivot table format with the appropriate headers.
Here is some sample values to test the script
16402008114
17006012198
00604005040
06901022020
031D5Y27115
19002019045
031D5Y27129
031D5Y27134
105D5Z02004
031D5Y27109
16301002229
Looking forward to your kind support
Option Explicit
Private chromebrowser As Selenium.ChromeDriver ' declaration at the module level. note we changed DIM to Private
Sub LoopScrapePropTaxWebsite()
Dim FindBy As New Selenium.By ' declaration of an auto instancing BY object to query any changes to HTML class name
Dim PropertyOwnerQueryResults As Selenium.WebElements ' Declaration of a variable that can take multiple web elements
Dim PropertyTaxResults As Selenium.WebElements ' Declaration of a variable that can take only one web element
Dim PropertyOwnerQueryResult As Selenium.WebElement
Dim PropertyTaxResult As Selenium.WebElement
Dim PropInfoTable As Selenium.WebElement
Dim PropTaxtable As Selenium.WebElement
Dim ValNo As String
Dim StratNo As String
Set chromebrowser = New Selenium.ChromeDriver
chromebrowser.Start 'The base URL in google chrome
chromebrowser.Get "https://ptsqueryonline.fsl.org.jm/PTSOnlineWeb/ptsquery.jsp" ' access the home page. If we wanted to access any page, we would just indicate the page name after the forward slash
Application.ScreenUpdating = False 'This line will turn off the screen flickering
Worksheets("PropTaxData").Activate ' Go back to the property Tax Sheet
Range("A2").Select
'-----------Loop to download the Property Tax Information
Do Until ActiveCell.Value = ""
ValNo = ActiveCell.Value
StratNo = ActiveCell.Offset(0, 1).Value
If chromebrowser.IsElementPresent(FindBy.Name("valuationno"), 5000) = False Then ' this is to account for any changes in class name as time passes. the number is the timeout period in millisecond, ie 3000 milliseconds is 3 sec
chromebrowser.Quit ' code to exit the chrome browser
MsgBox "Could not find search input box", vbExclamation ' Code to diplay message box with error
Exit Sub ' Exit sub routine
End If ' end if statement
chromebrowser.FindElementByName("valuationno").SendKeys ValNo
If chromebrowser.IsElementPresent(FindBy.Name("stratalotno"), 5000) = False Then ' this is to account for any changes in class name as time passes. the number is the timeout period in millisecond, ie 3000 milliseconds is 3 sec
chromebrowser.Quit ' code to exit the chrome browser
MsgBox "Could not find search input box", vbExclamation ' Code to diplay message box with error
Exit Sub ' Exit sub routine
End If ' end if statement
chromebrowser.FindElementByName("stratalotno").SendKeys StratNo
If chromebrowser.IsElementPresent(FindBy.Class("btn"), 5000) = False Then ' this is to account for any changes button name as time passes. the number is the timeout period in millisecond, ie 3000 milliseconds is 3 sec
chromebrowser.Quit ' code to exit the chrome browser
MsgBox "Could not find submit button", vbExclamation ' Code to diplay message box with error
Exit Sub ' Exit sub routine
End If ' end if statement
chromebrowser.FindElementByClass("btn", 3000).Click
'---------------Retriving the results of the property tax query
Set PropertyOwnerQueryResults = chromebrowser.FindElementsByClass("topInfo")
If PropertyOwnerQueryResults.Count = 0 Then ' Condition to account for an empty result set
'chromebrowser.Quit
MsgBox "No Results Found" & " " & "for Valuation No" & " " & ValNo, vbExclamation
Exit Sub
End If
For Each PropertyOwnerQueryResult In PropertyOwnerQueryResults ' loops through the web elements
Debug.Print PropertyOwnerQueryResult.Text ' print the webeleemnt texts to the immediate window
Next PropertyOwnerQueryResult
Set PropertyTaxResults = chromebrowser.FindElementsByClass("middleTable")
If PropertyTaxResults.Count = 0 Then ' Condition to account for an empty result set
chromebrowser.Quit
MsgBox "No Results Found", vbExclamation
Exit Sub
End If
For Each PropertyTaxResult In PropertyTaxResults ' This For Loop goes through the elements of the results table and prints the text in the immediate window
Debug.Print PropertyTaxResult.Text
Next PropertyTaxResult
chromebrowser.FindElementById("back").Click 'Select the back button
Worksheets("PropTaxData").Activate
ActiveCell.Offset(1, 0).Select ' Go to the next cell
Loop
End Sub
From: | Andrew G |
When: | 28 May 21 at 06:22 |
Hi there!
I think that the best thing to do is take a look at some of our videos on creating pivot tables! You can see the list of pivot table videos here https://www.wiseowl.co.uk/vba-macros/videos/vba-pivot-tables-charts/
I hope it helps!
From: | Stennettc |
When: | 28 May 21 at 07:17 |
Hi Andrew,
Thanks for the prompt response. I asked the wrong question. I need guidance on how to consolidate the data from the website to one sheet. The data is currently downloaded to multiple tabs.
Thanks
From: | Andrew G |
When: | 29 May 21 at 19:11 |
OK, that makes more sense! There are plenty of ways to build a list by finding the end of the current list (your suggestion is a good one) and we even have a video dedicated to discussing different techniques! https://www.wiseowl.co.uk/vba-macros/videos/vba-advanced/last-row-column-cell/
A quick way to avoid testing if the sheet is empty is to start at the bottom of column A, end upwards (this will stop at the first cell containing a value or cell A1 if the column is empty) and then offset one row downwards.
Range("A1048576").End(xlUp).Offset(1, 0)
This will leave row 1 of the worksheet empty but you could use that for column headings or just delete the row at the end.
I hope it helps!
From: | Stennettc |
When: | 29 May 21 at 13:52 |
Hi Andrew,
I took out the code that adds a new work sheet and replaced it with this line:
Set OutputSheet = ThisWorkbook.Worksheets("Results").
The challenge now is to prevent the results from being overwritten each time a new data set is added.
In other words, I need a dynamic way for the results sheet to be populated with all the results. I was thinking to code the output sheet using the following conditions:
Condition 1:-
If the results sheet is blank, then start writing the results to the output sheet starting at cell A1
Condition 2:-
If the output sheet is not blank, then go to the first empty cell in column A and then beginning writing the results.
I have been getting an errors each time I tried writing the code.
Your guidance would be greatly appreciated
From: | Andrew G |
When: | 29 May 21 at 06:47 |
Hi! I was going to say that you can remove the code which creates the new worksheet for each table that you're copying but I can't see anything in the two sets of sample code you've posted which does this. How are you creating the new worksheets that your tables are being written to?
From: | DutchWoody |
When: | 12 Mar 21 at 07:24 |
Hi Andrew,
Thank you so much for your reply to my post under Excel VBA Part 47.1. I've managed to apply and adjust the code scroll through all pages and tabs on the AustralianSuper site. When I tried to use a similar approach on another website, I seem to be running into a challenge. On this page https://www.pfzw.nl/over-ons/zo-beleggen-we/waarin-we-beleggen/overzicht-aandelen.html I want to copy the table (like with the AustralianSuper). But all of these FindElementByXXX don't seem to find the table, while in the inspector I can see it there clearly:
Set PFZWTable = ch.FindElementByClass("data-table style-scope p-data-table")
Set PFZWTable = ch.FindElementByCss(".data-table.style-scope.p-data-table")
Set PFZWTable = ch.FindElementByTag("Table")
Set PFZWTable = ch.FindElementsByXPath("//*[@id='tabledatasource-642984829']//div/div/div/div/table")
Inspector element: <table class="data-table style-scope p-data-table" style="min-width: 830px">, XPath //*[@id="tabledatasource-642984829"]//div/div/div/div/table
I've also given the page a little while to load (ch.Wait "3000"), just in case it needed time to load. I seem to be struggling with FindElementBy the above options in general. On other pages, the only FindElement that works for me is FindElementByLinkText, and in one case ch.FindElementByCss(".next") to click the Next button.
What am I doing wrong? Am I missing something? If an element seems to have the tag 'table' why does ch.FindElementByTag("Table") when there is only one table tag on the page? The same for FindElementByClass("data-table style-scope p-data-table"). I've watch a lot of (but admittedly not all of) your video 57.4, but still can't seem find what I'm doing wrong.
Your guidance on this is really appreciated!
Best,
From: | Andrew G |
When: | 12 Mar 21 at 17:40 |
Well I've learnt something about web design that I didn't previously know! Working code below but you might find this useful as background information first
https://developer.mozilla.org/en-US/docs/Web/Web_Components/Using_shadow_DOM
Here's the code which writes the table you're trying to get into a new worksheet:
Private ch As Selenium.ChromeDriver
Sub ScrapePFZW()
Dim ShadowHost As Selenium.WebElement
Dim ShadowRoot As Selenium.WebElement
Dim PFZWTable As Selenium.WebElement
Dim ws As Worksheet
Set ch = New Selenium.ChromeDriver
ch.Start
ch.Get "https://www.pfzw.nl/over-ons/zo-beleggen-we/waarin-we-beleggen/overzicht-aandelen.html"
'Get a reference to the parent element of the shadow root
Set ShadowHost = ch.FindElementByCss("p-data-table")
'Return a reference to the shadow root of the parent element
Set ShadowRoot = ch.ExecuteScript("return arguments[0].shadowRoot", ShadowHost)
'Search within the shadow DOM using normal FindElement methods
Set PFZWTable = ShadowRoot.FindElementByCss("table")
Set ws = ThisWorkbook.Worksheets.Add
PFZWTable.AsTable.ToExcel ws.Range("A1")
End Sub
Hope that helps!
Kingsmoor House
Railway Street
GLOSSOP
SK13 2AA
Landmark Offices
99 Bishopsgate
LONDON
EC2M 3XD
Holiday Inn
25 Aytoun Street
MANCHESTER
M1 3AE
© Wise Owl Business Solutions Ltd 2024. All Rights Reserved.