Read our blogs, tips and tutorials
Try our exercises or test your skills
Watch our tutorial videos or shorts
Take a self-paced course
Read our recent newsletters
License our courseware
Book expert consultancy
Buy our publications
Get help in using our site
537 attributed reviews in the last 3 years
Refreshingly small course sizes
Outstandingly good courseware
Whizzy online classrooms
Wise Owl trainers only (no freelancers)
Almost no cancellations
We have genuine integrity
We invoice after training
Review 30+ years of Wise Owl
View our top 100 clients
Search our website
We also send out useful tips in a monthly email newsletter ...
Posted by Andrew Gould on 14 November 2016
Scraping a web page in VBA involves getting a reference to an HTML document and then processing its elements to extract the useful parts and write them to another location in a readable format. This video shows you how to do that using both Internet Explorer and basic XML HTTP requests. You'll learn a bit about the Document Object Model, as well as how to identify HTML elements by name and by tag type. You'll also see how to loop through various collections of HTML elements and their children using an example involving exchange rate tables.
See our full range of VBA training resources, or test your knowledge of VBA with one of our VBA skills assessment tests.
This video has the following accompanying files:
File name | Type | Description |
---|---|---|
Scraping Web Pages.xlsm | Excel workbook with macros |
Click to download a zipped copy of the above files.
There are no exercises for this video.
You can increase the size of your video to make it fill the screen like this:
Play your video (the icons shown won't appear until you do), then click on the full screen icon which appears as shown above.
When you've finished viewing a video in full screen mode, just press the Esc key to return to normal view.
To improve the quality of a video, first click on the Settings icon:
Make sure yoiu're playing your video so that the icons shown above appear, then click on this gear icon.
Choose to change the video quality:
Click as above to change your video quality.
The higher the number you choose, the better will be your video quality (but the slower the connection speed):
Don't choose the HD option shown unless your connection speed is fast enough to support it!
Is your Wise Owl speaking too slowly (or too quickly)? You can also use the Settings menu above to change your playback speed.
From: | Doclocko |
When: | 21 Feb 23 at 10:38 |
Hi there
I'm working in Google Chrome, using Selenium and Excel VBA to interact with websites. One website I'm scraping has a number of element Id's such as:
<a id="ucRequoteViewReportTop_lnkRequote" href="javascript:__doPostBack('ucRequoteViewReportTop$lnkRequote','')">Requote</a>
I can't get my VBA code to click the button that has the ID above, and wondered if anyone could help me find a way to do this please?
Thank you, Paul.
From: | Andrew G |
When: | 21 Feb 23 at 11:02 |
Hi Paul!,
I'd take a look at video 57.4 - Finding Web Elements in Selenium (you should be able to see a link on the left of this page). Look for the parent button element in which your anchor (a) element is nested and apply the click method to it.
I hope that helps!
From: | Doclocko |
When: | 21 Mar 23 at 10:05 |
Hi Andrew.
Thanks again for the links you sent me re websites logging the user off due to inactivity. Unfortunately, I still haven't found a way to resolve this one.
I was wondering if there's a way for Selenium to continuously monitor for the existence of a web element, so that once it appears on the page (in my case a web element in the pop-up telling me that I am to be logged off shortly) I can detect its existence and run appropriate code to deal with it?
I can't search for the element at a particular point in the code because once the log off pop up appears the code stops running. Therefore, I'm looking for a way to continually look for the existence of the web element in the pop up by some other means. The closest analogy I can think of is "Private Sub Worksheet_Change(ByVal Target As Range)" in VBA.
Any suggestions - as always - very much appreciated.
Thanks!
Paul.
From: | Andrew G |
When: | 22 Mar 23 at 07:29 |
Hi Paul,
I'm really not sure what the solution is here. I know that Selenium has event listeners but I'm not sure how you'd make use of them with VBA. You can see the Event Listeners tab in the panel in Chrome that you use to inspect elements. That might be a starting point? I hope it helps!
From: | Doclocko |
When: | 12 Mar 23 at 15:04 |
Hi Andrew.
Here's another one that's got me stumped - if there's any advice you can offer it would be much appreciated.
The website I'm working on automatically logs me out after 15 minutes of inactivity. It sees me as inactive while my macro is running, even though the macro is filling in taxt boxes, clicking on different tabs in the iframe etc. The only way I've managed to overcome this is to add code that closes the web browser every 14 minutes, then re-opens the web browser and logs in to the website again, which is quite time-consuming.
Do you know if there's a way to 'trick' the website into thinking I'm active on it while the macro is running?
Thanks in advance!
Paul.
From: | Andrew G |
When: | 13 Mar 23 at 07:38 |
Hi Paul!
I've not encountered that issue but it seems that you're not the only one experiencing it
There are some suggestions for working around the problem in the first link but it's unclear whether they worked. Perhaps it will point you in the right direction!
From: | Doclocko |
When: | 27 Feb 23 at 14:27 |
Hi Andrew
iframe worked! I can't thank you enough for taking the time to look at my problem and for giving me appropriate advice.
If anything else crops up I'll drop you another message if that's okay?
In the meantime, please keep up the fantastic work - I'm sure there are thousands of people out there who appreciate your website, your YouTube videos and your help regarding specific problems as much as I do.
Best wishes, Paul.
From: | Andrew G |
When: | 27 Feb 23 at 16:21 |
That's great Paul! Happy to hear that it helped and thanks for the kind words!
From: | Doclocko |
When: | 27 Feb 23 at 13:58 |
Hi Andrew.
Thanks again for your helpful comments. I'm pretty sure the elements have time to load as I pause the code just before they are searched for and then step through the code.
I'll take a look at the ifrrame now. Fingers crossed - I'll let you know how I get on!
Cheers, Paul.
From: | Doclocko |
When: | 25 Feb 23 at 16:58 |
Hi Andrew
I've done everything I can think of to resolve the issue I have, but no luck! The closest I can get to identifying the issue is by working upwards through the elements (starting from the one I mentioned in my original post):
<a id="ucRequoteViewReportTop_lnkRequote" href="javascript:__doPostBack('ucRequoteViewReportTop$lnkRequote','')">Requote</a>
and trying to see if Selenium can find if the elements above (and including) this one exist, using XPath searches such as:
If cd.IsElementPresent(FindBy.XPath("//*[@id='ucRequoteViewReportTop_lnkRequote']")) Then
MsgBox "found"
Else
MsgBox "not found"
End If
As I work up through the elements, the code returns "not found" up to the following element:
< html xmlns="http://www.w3.org/1999/xhtml">
This element is found, as is each element above it, but elements below this element are not found.
Do you have any ideas why the elements beneath < html xmlns="http://www.w3.org/1999/xhtml"> do not exist according to Selenium?
Any thoughts very much appreciated.
Thanks, Paul.
From: | Andrew G |
When: | 27 Feb 23 at 07:37 |
Hi Paul,
It's tricky to know without seeing the page but the first things I'd check are that the elements you're trying to find have enough time to load and that they're not contained in an iframe. For the former you can try stepping through your code at a slower pace (if this proves to be the cause you can add code to wait longer as shown in video 57.5). For the latter you can add code to switch to the iframe before trying to locate the other elements.
I hope that helps!
From: | Mctabish |
When: | 06 Aug 22 at 16:28 |
When running the code as is, I did get an ACCESS DENIED at the
"XMLPage.send"
in sub GetExchangeRates(FromCurrency.
This is due to the site is not an "https" (secured site) instead of "HTTP" .The fix is easy.
Change the line from:
URL = "http://x-rates.com/table/?from=" & FromCurrency & "&amount=" & Amount
to
URL = "https://x-rates.com/table/?from=" & FromCurrency & "&amount=" & Amount 'had to change http:/ to https:/ (initials & date)
Thanks much for the training!
From: | Andrew G |
When: | 08 Aug 22 at 07:24 |
Indeed, of course at the time the video was recorded the site wasn't using https.
Thanks for watching and for taking the time to post!
From: | a_sh |
When: | 07 Apr 22 at 22:22 |
I'm writing a code in vba to parse data from some websites using xmlhttp request but I noticed that getElementsByTagName method works just for some specific tags.
I prepared the following simple code to test it. The code works with some specific tags like "span", "div", "ls", "a" & "b" but not for others! (you can change the TestTag variable and test it)(The code doesn't work with "lx", "pos", "c" and ... for me.)
I don't know the reason. I searched for the answer and faced with some content about namespace, I read about it but I couldn't find any relation between my problem and this. Please help me to understand the problem.
I changed the code to late binding to be easily used and tested without setting reference objects.(both have the same results.)
Sub TEST()
'Dim HTMLDoc As New MSHTML.HTMLDocument
Dim HTMLDoc As Object
Set HTMLDoc = CreateObject("htmlfile")
'Dim HTMLDiv As MSHTML.IHTMLElement
Dim HTMLDiv As Object
Dim TestTag As String
TestTag = "div"
HTMLDoc.body.innerHTML = "<BODY><" & TestTag & _
"><FIRSTTAG><SPAN class=firstclass>FirstContent<SECTAG>-SecContent</SECTAG></SPAN></FIRSTTAG> </" & _
TestTag & "></BODY>"
Set HTMLDiv = HTMLDoc.getElementsByTagName(TestTag).Item(0)
MsgBox HTMLDiv.innerText
End Sub
From: | Andrew G |
When: | 08 Apr 22 at 08:21 |
Hi! I'm not familiar with the lx, pos or c HTML tags. Here's a list of valid HTML tags https://developer.mozilla.org/en-US/docs/Web/HTML/Element
I hope it helps!
From: | ThaiBinh |
When: | 28 Oct 21 at 13:39 |
Hi Andrew,
First, I want to thank you for your great videos that are very usefull for me. It's a shame that I didn't Wiseowl before. But better late than never. ^^
Second, I have a problem as below (sorry my bad English, if I can't explain it well):
As you said, we should use XML tech to get data from website because it's much more quicker. And I totally agree with that, but in this case I can't figure out how to use it.
In this video you show us how to get data from a new webpage, but my problem is that I have to get data from an existed webpage frequently (per 10 minutes) because it will be much quicker than open a new webpage. And the most important that one day I just only have five times to enter my username and password to log in that page, that's very inconvinence so I can't use the code that open the new webpage each time.
Therefore, could you be kind to give me some advise to help me solve that problem?
How can I loop through the whole windows and and define the existed webpage by using XML tech?
I'm using shellobject to do that, but I just only know use InternetExplorer in this case so it takes quite a long time to get all the data. :(
I hope to hear from you soon,
Have a great day!
Best regards,
ThaiBinh
From: | Andrew G |
When: | 29 Oct 21 at 07:37 |
Hi!
I'm pleased to hear that you've found the videos useful! Sadly I don't have any good suggestions on how to achieve what you want to do here. Perhaps one of our other readers has some ideas?
From: | DeepakVBA |
When: | 29 May 21 at 06:50 |
Hi Andrew,
I need to extract the data from multiple HTML files into excel using VBA. need to extract paramname and Paramvalue to excel. can you help me out this with a code.
From: | Andrew G |
When: | 29 May 21 at 07:29 |
Hi Deepak!
Is there something specific that isn't working for you? What code have you written so far?
From: | DeepakVBA |
When: | 29 May 21 at 13:19 |
Hi Andrew,
The HTML files are present in local folder, Till now i just done a code to locate the files in the folder, after that dont have a clue to proceed further.
Could you please help me on this with the code to extract the data. TIA
From: | Andrew G |
When: | 29 May 21 at 19:25 |
Hi Deepak! Can I offer a couple of links which might help you get the HTML file loaded into an object:
https://stackoverflow.com/a/59957320
https://stackoverflow.com/questions/51114836/parse-saved-html-file-vba
From that point you can then work with the HTML document object just as with the websites shown in the video. You'll need to find elements, loop through the ones you're interested in and extract the InnerText of the elements to get the values you need. These techniques are shown in the video, you'll just need to adapt them to your specific example.
I hope it helps!
From: | sivuddu |
When: | 22 May 21 at 17:21 |
I am able execute code written by WiseOwl, but i am trying execute code for site - https://fedai.org.in even i am unable to find elements by id or by name etc...
Tab "Rates" > Report as under
*Revaluation Rates
Lates 2 excel reports to be extracted
Need to create folder and report name as Revaluation Rates with respective date.
getting error at object variable or withblock variable not set at HTMLButtons(0).Click because need to close pop-up screen and then need to extract the files
Option Explicit
Sub BrowseToSiteToGetXML1()
Dim IE As New SHDocVw.InternetExplorer
Dim XetraHTMLDoc As MSHTML.HTMLDocument
Dim HTMLButtons As MSHTML.IHTMLElementCollection
Dim HTMLInput As MSHTML.IHTMLElement
IE.Visible = True
IE.Navigate "https://www.fedai.org.in/"
Do While IE.ReadyState <> READYSTATE_COMPLETE Or IE.Busy 'Wait until site is ready
Loop
Set XetraHTMLDoc = IE.Document
Set HTMLButtons = XetraHTMLDoc.getElementsByTagName("button")
HTMLButtons(0).Click
End Sub
From: | Andrew G |
When: | 23 May 21 at 15:01 |
Hi there! I'm sorry I don't know the answer to this one I'm afraid. You may find it easier to use Google Chrome rather than Internet Explorer. We have some videos on how to use Chrome starting at Part 57.1 in this playlist https://www.wiseowl.co.uk/vba-macros/videos/vba-scrape-websites/
I hope it helps!
From: | Stennettc |
When: | 17 May 21 at 21:16 |
Hello, The code failed at the last line ie the code failed to populate the text box...Help Please
I am using the 64 bit version of the 2010 Office Professional Plus
Option Explicit
Sub BrowseToSite()
Dim IE As SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
Dim HTMLInput As MSHTML.IHTMLElement
IE.Visible = True
IE.Navigate "https://www.wiseowl.co.uk/"
Do While IE.ReadyState <> READYSTATE_COMPLETE 'Pausing Excel while the website is loading
Loop
Set HTMLDoc = IE.Document
Set HTMLInput = HTMLDoc.getElementById("what")
HTMLInput.Value = "VBA"
End Sub
From: | Andrew G |
When: | 17 May 21 at 21:50 |
Hi there!
The input box you're attempting to access doesn't have an id attribute (although it may have had at some time in the past!). The input box has both a class and a name attribute so you could do this using either:
Set HTMLInput = HTMLDoc.getElementsByName("what")(0)
Or
Set HTMLInput = HTMLDoc.getElementsByClassName("search")(0)
I hope that helps!
From: | Stennettc |
When: | 17 May 21 at 22:28 |
Hi Andrew,
Thanks for the prompt response. Loving the WiseOwl content!!
Unfortunately both suggestions didn’t work. The Run-time error '91'. "Object variable or with block variable not set" came up under both scenarios. The code is in debug mode and line 11 is highlighted. I made the following observations:-
1) Each time I hovered over line 3, Dim HTMLInput As MSHTML.IHTMLElement, i see a message box stating HTMLInput = Nothing
2) Each time I hovered over Set line 10 HTMLInput = HTMLDoc.getElementsByClassName("search")(0), i see a message box stating HTMLInput = Nothing
3) Each time I hovered line 5, IE.Visible = True, i see a message box stating the remote server machine does not exist or is unavailable
Do you have any recommendations on how to resolve these errors?
The Internet controls, HTML object library under VBA Project references and all macros have been enabled.
From: | Andrew G |
When: | 19 May 21 at 06:06 |
Excellent! Happy to hear that you got it working!
From: | Stennettc |
When: | 18 May 21 at 16:27 |
Thanks again for the prompt response! The Selenium solution worked beautifully!
From: | Andrew G |
When: | 18 May 21 at 07:38 |
Hmm, I'm not sure, both of the alternatives I suggested work for me so I can't replicate the problem.
If memory serves, this issue can sometimes be resolved by declaring your IE variable As InternetExplorerMedium (I believe it changes the security level of the application).
Alternatively, I wonder if you might have more success using Google Chrome and the SeleniumBasic library to do this? We have some videos on that topic starting at part 57.1 in this list of videos https://www.wiseowl.co.uk/vba-macros/videos/vba-scrape-websites/
From: | kibra |
When: | 20 Mar 21 at 16:42 |
Hi Andrew,
Thank you for sharing these tutorials – they are really great.
I am trying to use the search button on https://www.boerse-frankfurt.de/en to find a specific stock by using code similar to the tutorial.
I can get the text into the search box, but the functionality on the site, when you type in manually, is that a comboBox is made with the link to the matching stock.
The problems seems to be that the search button will not bring you to the link without the comboBox. Is there a way to fix this with a code?
You help will be really appreciated.
Best regards, Kim
Sub BrowseToSiteToGetXML()
Dim IE As New SHDocVw.InternetExplorer
Dim XetraHTMLDoc As MSHTML.HTMLDocument
Dim HTMLButtons As MSHTML.IHTMLElementCollection
Dim HTMLInput As MSHTML.IHTMLElement
IE.Visible = True
IE.navigate "https://www.boerse-frankfurt.de/en"
Do While IE.ReadyState <> READYSTATE_COMPLETE Or IE.Busy 'Wait until site is ready
Loop
Set XetraHTMLDoc = IE.Document
Set HTMLInput = XetraHTMLDoc.getElementById("mat-input-0")
HTMLInput.Click
HTMLInput.Value = "IQQH"
Do While IE.ReadyState <> READYSTATE_COMPLETE Or IE.Busy 'Wait until site is ready
Loop
Set HTMLButtons = XetraHTMLDoc.getElementsByTagName("button")
HTMLButtons(0).Click
End Sub
From: | Andrew G |
When: | 21 Mar 21 at 20:08 |
Hi Kim,
I'm not even going to pretend that I know why this is working but here are some notes before the code:
1) I'm using the SeleniumBasic library to control Google Chrome rather than IE. I suspect that you can get this to work in IE as well but Selenium is easier. We have several videos on using Selenium starting at part 57.1 on this page.
2) I don't know why, but the key to getting things to update on the website you're using appears to be activating a different application window. In the code below I've done this simply by activating the Excel application window. It apparently also works by adding code to open Notepad, PowerPoint, Word, etc. As long as the browser window isn't active it appears to update. I don't know enough about the topic to understand why!
3) Even after activating a different application it seems that the link in the combobox doesn't appear on the screen and so can't be interacted with. I've used the ExecuteScript method to execute the javascript code to click the link.
4) I've added a simple loop to process all the tables on the page you land on after following the link. Each table goes into a new worksheet so you'll end up with 14 new worksheets (just to warn you!).
Here's the code (of course it relies on having Selenium set up and configured as per the videos I linked to above).
Option Explicit
Private cd As Selenium.ChromeDriver
Sub GetTablesForSelectedItem()
Dim NameInput As Selenium.WebElement
Dim SuggestionsList As Selenium.WebElement
Dim Suggestions As Selenium.WebElements
Dim AllTables As Selenium.WebElements
Dim SingleTable As Selenium.WebElement
Set cd = New Selenium.ChromeDriver
cd.Start
cd.Get "https://www.boerse-frankfurt.de/en"
Set NameInput = cd.FindElementByCss("#mat-input-0")
NameInput.SendKeys "IQQH"
cd.Wait 2000
Set SuggestionsList = cd.FindElementByCss("#mat-autocomplete-0")
Set Suggestions = SuggestionsList.FindElementsByCss("mat-option")
Debug.Print Suggestions.Count
If Suggestions.Count > 0 Then
cd.ExecuteScript _
Script:="arguments[0].click();", _
arguments:=Suggestions(1)
End If
AppActivate ThisWorkbook.Name
cd.Wait 2000
Set AllTables = cd.FindElementsByCss("table")
Debug.Print AllTables.Count
For Each SingleTable In AllTables
SingleTable.AsTable.ToExcel Worksheets.Add.Range("A1")
Next SingleTable
End Sub
If you ever work out why the site behaves that way I'd love to hear about it!
From: | kibra |
When: | 22 Mar 21 at 18:42 |
Thank you Andrew for the quick reply.
Selenium works beautiful :-)
The AppActivate did not work for me. It gave me a run-time error: '5'
Instead I added some code that click the decline button in the the pop-up about coockies.
Then it works for me. See code below.
Now, I will actually like to work with the HTMLDocument without opening the browser. Is that also possible with Selenium?
Best regards, Kim
Option Explicit
Private cd As Selenium.ChromeDriver
Sub GetTablesForSelectedItem()
Dim NameInput As Selenium.WebElement
Dim SuggestionsList As Selenium.WebElement
Dim Suggestions As Selenium.WebElements
Dim PopupButton As Selenium.WebElement
Dim AllTables As Selenium.WebElements
Dim SingleTable As Selenium.WebElement
Set cd = New Selenium.ChromeDriver
cd.Start
cd.Get "https://www.boerse-frankfurt.de/en"
Set NameInput = cd.FindElementByCss("#mat-input-0")
NameInput.SendKeys "IQQH"
cd.Wait 2000
Set SuggestionsList = cd.FindElementByCss("#mat-autocomplete-0")
Set Suggestions = SuggestionsList.FindElementsByCss("mat-option")
Debug.Print Suggestions.Count
If Suggestions.Count > 0 Then
cd.ExecuteScript _
Script:="arguments[0].click();", _
arguments:=Suggestions(1)
End If
Set PopupButton = cd.FindElementById("cookie-hint-btn-decline")
If PopupButton.IsDisplayed Then
PopupButton.Click
End If
'AppActivate ThisWorkbook.Name
cd.Wait 2000
Set AllTables = cd.FindElementsByCss("table")
Debug.Print AllTables.Count
'For Each SingleTable In AllTables
' SingleTable.AsTable.ToExcel Worksheets.Add.Range("A1")
'Next SingleTable
End Sub
From: | Andrew G |
When: | 22 Mar 21 at 20:41 |
Hi Kim, Selenium is great isn't it?!
Funny, I had added code to click the Decline cookies button as the first thing I did but it had no effect on the ability to click the option in the combobox so I removed the code. I'm pleased you got it to work!
You can run an invisible instance of Chrome by using Headless Chrome but you'll still need the browser running in order to create all the elements that are dynamically generated by the javascript for the website you're using. Here's a basic example:
Sub GetTableOfMovies()
Dim cd As Selenium.ChromeDriver
Dim t As Selenium.WebElement
Set cd = New Selenium.ChromeDriver
cd.AddArgument "--headless"
cd.Start
cd.Get "https://en.wikipedia.org/wiki/List_of_highest-grossing_films"
Set t = cd.FindElementsByCss("table")(1)
t.AsTable.ToExcel Worksheets.Add.Range("A1")
cd.Quit
End Sub
You can also send http requests without using a browser using XMLHTTP requests - we have a video on that topic which you can see here.
The problem is that all you get back is the response text from the server. If that includes javascript code that is designed to run at the client to produce the elements on the page then you have no effective way to access those elements without using a browser.
It works great as long as the response text includes the elements you want to get:
Sub MovieTableUsingXMLHTTP()
Dim XMLRequest As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim HTMLTable As MSHTML.HTMLTable
XMLRequest.Open "GET", "https://en.wikipedia.org/wiki/List_of_highest-grossing_films", False
XMLRequest.Send
If XMLRequest.Status <> 200 Then
MsgBox XMLRequest.Status & " - " & XMLRequest.statusText
Exit Sub
End If
HTMLDoc.body.innerHTML = XMLRequest.responseText
Set HTMLTable = HTMLDoc.getElementsByTagName("table")(0)
WriteTableToWorksheet HTMLTable
End Sub
Sub WriteTableToWorksheet(TableToProcess As MSHTML.HTMLTable)
Dim TableRow As MSHTML.HTMLTableRow
Dim TableCell As MSHTML.HTMLTableCell
Dim RowNum As Long, ColNum As Long
Dim OutputSheet As Worksheet
Set OutputSheet = ThisWorkbook.Worksheets.Add
For Each TableRow In TableToProcess.Rows
RowNum = RowNum + 1
For Each TableCell In TableRow.Cells
ColNum = ColNum + 1
OutputSheet.Cells(RowNum, ColNum).Value = TableCell.innerText
Next TableCell
ColNum = 0
Next TableRow
End Sub
I hope that helps!
From: | DutchWoody |
When: | 08 Mar 21 at 15:25 |
Hi Andrew,
You tutorial is fantastically useful! It made a once daunting task appear to be within the realms of possibility!
However, I'm struggling to get the code to work for a specific website. The getElementsByTagName isn't picking up the table on this site. Could you let me know what I'm doing wrong? Is it because the site is .aspx? Essentially, my next step will hopefully be to get the data per row ("tr"), but I can't even get the table to work.
Currently I have this:
Sub AustralianSuperRetriever()
Dim XMLPage As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
' XMLPage.Open "Get", "https://x-rates.com/table/?from=GBP&amount=5", False
XMLPage.Open "Get", "https://portal.australiansuper.com/investments-and-performance/what-we-invest-in/superannuation-premixed-investment-options/high-growth.aspx", False
XMLPage.send
HTMLDoc.body.innerHTML = XMLPage.responseText
ProcessHTMLPage HTMLDoc
End Sub
Sub ProcessHTMLPage(HTMLPage As MSHTML.HTMLDocument)
Dim HTMLTable As MSHTML.IHTMLElement
Dim HTMLTables As MSHTML.IHTMLElementCollection
Set HTMLTables = HTMLPage.getElementsByTagName("Table") '- this doesn't work
'Set HTMLTables = HTMLPage.getElementsByTagName("tr") 'this one doesn't work. TR refers to table row.
'Set HTMLTables = HTMLPage.getElementsByClassName("fn-table-feeder-trigger") '- this one works!
'Set HTMLTables = HTMLPage.getElementsByClassName("fn-table-container table-container") '- this one also seems to work
For Each HTMLTable In HTMLTables
Debug.Print HTMLTable.className, HTMLTable.ID, HTMLTable.innerText, HTMLTable.tagName, HTMLTable.Title 'noIDs
Next HTMLTable
End Sub
Any help will be really appreciated!
Best,
From: | Andrew G |
When: | 08 Mar 21 at 16:42 |
Hi Woody!
Good try with the XMLHTTP request but sadly most of this particular page uses javscript functions to generate its content. If you check the contents of the responseText property you won't find any table elements at all. One solution is to use a web browser to allow the javascript code to generate the content and then scrape the page once this has happened. I'd recommend using the SeleniumBasic library so that you can use a modern browser like Chrome rather than IE. Here's some basic code to write the table on the page you've linked to into a new worksheet:
Option Explicit
Private ch As Selenium.ChromeDriver
Sub ScrapeAustralianSuper()
Dim AussieTable As Selenium.WebElement
Dim ws As Worksheet
Set ch = New Selenium.ChromeDriver
ch.Start
ch.Get "https://portal.australiansuper.com/investments-and-performance/what-we-invest-in/superannuation-premixed-investment-options/high-growth.aspx"
Set AussieTable = ch.FindElementByCss(".tablesorter.complexTable")
Set ws = ThisWorkbook.Worksheets.Add
AussieTable.AsTable.ToExcel ws.Range("A1")
End Sub
We recently added some videos on working with SeleniumBasic starting with Part 57.1 on this page https://www.wiseowl.co.uk/vba-macros/videos/vba-scrape-websites/
I hope that helps!
From: | DutchWoody |
When: | 09 Mar 21 at 15:18 |
Hi Andrew,
Thank you so much for your quick response and great explanation! I will have a look at your tutorials on Selenium and Chrome, apply the code you gave, and give it another go!
Thanks again!
Best,
Ward
From: | Andrew G |
When: | 10 Mar 21 at 06:48 |
You're very welcome! Have fun and good luck!
From: | Bakir |
When: | 07 Sep 20 at 02:46 |
Hi,
I'm testing this code but somehow it doesn't work for me, coming up with an Error (Expected variable or procedure, not module). Below is my code. Your hlep would be appreciated.
Sub BrowseToExchangeRatesWithQueryStringAndXML()
Dim XMLPage As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
XMLPage.Open "GET", "https://x-rates.com/table/?from=GBP&amount=5", False
XMLPage.send
HTMLDoc.body.innerHTML = XMLPage.responseText
ProcessHTMLPage HTMLDoc
End Sub
_______________________
Sub ProcessHTMLPage(HTMLPage As MSHTML.HTMLDocument)
Dim HTMLTable As MSHTML.IHTMLElement
Dim HTMLTables As MSHTML.IHTMLElementCollection
Dim HTMLRow As MSHTML.IHTMLElement
Set HTMLTables = HTMLPage.getElementsByTagName("table")
For Each HTMLTable In HTMLTables
Debug.Print HTMLTable.className
For Each HTMLRow In HTMLTable.getElementsByTagName("tr")
Debug.Print vbTab & HTMLRow.innerText
Next HTMLRow
Next HTMLTable
End Sub
From: | Andrew G |
When: | 07 Sep 20 at 07:58 |
Hi Bakir,
It sounds as though you've given the same name to both a module and a subroutine. Try changing the module name to something different.
I hope that helps!
From: | srikanth |
When: | 16 May 20 at 20:19 |
Hello Andrew, this is Srikanth here, I am trying to get the data from a website "www.gst.gov.in". Once the login page is loaded the Internet Explorer ready state is complete but there are other things load after the Internet Explorer's ready state is complete due to which the I am getting error in finding the elements. How to wait untill the animation is complete and inner page is loaded? I don't want to use the wait function. InternetExplorer is not able to identify that the inner page is not yet loaded so its ready state shows as completed.
Below is my code, please help me in this.
Sub WebScraping()
Dim IE As New SHDocVw.InternetExplorer
Dim Documents As MSHTML.HTMLDocument
Dim Element As MSHTML.IHTMLElement
Dim Elements As MSHTML.IHTMLElementCollection
Dim UserName As MSHTML.IHTMLElement
Dim Password As MSHTML.IHTMLElement
IE.Visible = True
IE.navigate "https://www.gst.gov.in/"
Do Until IE.readyState = READYSTATE_COMPLETE
Loop
Set Documents = IE.document
'For Login Page Click
Set Elements = Documents.getElementsByTagName("a")
For Each Element In Elements
If Element.getAttribute("target") = "_self" And Element.getAttribute("href") = "//services.gst.gov.in/services/login" Then
Element.Click
Exit For
End If
Next Element
'UserID
Set UserName = Documents.getElementById("username")
UserName.Value = "Username"
End Sub
From: | Andrew G |
When: | 17 May 20 at 06:45 |
Hi Srikanth,
You could try this:
Do While IE.Busy Or IE.ReadyState <> READYSTATE_COMPLETE
Otherwise, the Wait method might be your best option.
I hope that helps!
From: | srikanth |
When: | 17 May 20 at 07:05 |
Hey Andrew, Thank you so much for your response. Since i didnt wanted the wait function I found below solution for this.
Do
Set UserName = Documents.getElementById("username")
Loop While UserName Is Nothing
UserName.Value = "Username"
Thank you once again for your wonderful course and your teaching technique.
Regards, Srikanth
From: | Andrew G |
When: | 18 May 20 at 06:24 |
Hi Srikanth, that's great! Happy to hear that you found a solution that works for you!
From: | JayK |
When: | 20 Apr 20 at 21:13 |
I am solving the following issue: I want to scrape a title of website when link inserted in column A and put this value to relevant cell (next to it) in column B. The issue seems to be that once I paste the website in column A, the code reruns the entire list from column A2 to "last row" as defined in the code. Is there any way to only modify column B once a single column A is modified? I.e. if I paste a link in column A36 I get a title in B36, regardless of whether the cell is in the middle of the used range or at the very bottom (i.e. only that very cells gets affected..) I would like to use this without having to re-run multiple inputs as it currently stands; (i.e. the loop "for i =2 to last row")? Also, I would like to change the below from Modular macro i.e. sub to private sub reacting to change (i.e. intersect function) where the 'target' is any cell from A:A range. Many thanks!
the code is as follows:
Sub get_title_header()
Dim wb As Object
Dim doc As Object Dim sURL As String
Dim lastrow As Long
lastrow = Sheet1.Cells(Rows.Count, "A").End(xlUp).row
For i = 2 To lastrow
Set wb = CreateObject("internetExplorer.Application")
sURL = Cells(i, 1) wb.navigate sURL
wb.Visible = False
While wb.Busy
DoEvents Wend ''HTML document
Set doc = wb.document
Cells(i, 2) = doc.Title
On Error GoTo err_clear
Cells(i, 3) = doc.GetElementsByTagName("h1")(0).innerText
err_clear: If Err <> 0 Then Err.Clear Resume
Next
End If
wb.Quit Range(Cells(i, 1), Cells(i, 3)).Columns.AutoFit
Next i
End Sub
From: | Andrew G |
When: | 21 Apr 20 at 07:16 |
Hi Jay, I think you will find event procedures useful in this case https://www.wiseowl.co.uk/vba-macros/videos/vba-basics/events/
You can use the Worksheet_Change event to detect when the value of a cell on a worksheet has changed and take the appropriate action.
Here's the Microsoft documentation for the Worksheet_Change event https://docs.microsoft.com/en-us/office/vba/api/excel.worksheet.change
I hope that helps!
From: | Smithy58 |
When: | 29 Feb 20 at 17:17 |
Hi Andrew
I am very impressed with the web scraping course and it is presented very well. i am however a bit stuck on the code where it stops on the For HTMLButton ... loop stating an object is required. I have been over the code and cannot see anything different to what you have in the video, can you please help on this?
Sub GetHTMLDocument()
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
Dim HTMLInput As MSHTML.IHTMLElement
Dim HTMLButtons As MSHTML.IHTMLElementCollection
Dim HTMLButton As MSHTML.IHTMLElement
IE.Visible = True
IE.navigate "wiseowl.co.uk"
Do While IE.ReadyState <> READYSTATE_COMPLETE
Loop
Set HTMLDoc = IE.Document
Set HTMLInput = HTMLDoc.getElementById("what")
HTMLInput.Value = "Excel VBA"
Set HTMLBittons = HTMLDoc.getElementsByTagName("button")
For Each HTMLButton In HTMLButtons
Debug.Print HTMLButton.className, HTMLButton.tagName, HTMLButton.ID, HTMLButton.innerText
Next HTMLButton
End Sub
Thanks in advance
Colin
From: | Andrew G |
When: | 02 Mar 20 at 07:41 |
Hi Colin,
It appears to be a typo in this line:
Set HTMLBittons = HTMLDoc.getElementsByTagName("button")
If you change HTMLBittons to HTMLButtons it will solve the issue.
Before you do that, try adding Option Explicit to the top of the module, above any subroutines you have written. When you try to run the code, it will highlight misspelt variable names.
I hope that helps!
From: | M G Joshi |
When: | 08 Oct 19 at 14:53 |
Dear Sir
Firstly thank you for your excellent videos
For the Example shown in Video 47 both your procedures …
1. Sub BrowseToExchangeRatesWithQueryString() and
2. Sub BrowseToExchangeRatesWithQueryStringAndXML()
produce the same results for the statement for the url, https://www.x-rates.com/table/?from=EUR&amount=3 … namely …
Debug.Print HTMLTables.Length --> displays 2 for the number of tags with the name “table” for both procedures.
However, for the following url, https://www.wsj.com/market-data/commodities with has 3 tables with the tag name “table”
Debug.Print HTMLTables.Length --> displays 3 for only the QueryString (procedure 1) and shows 0 for the QueryStringAndXML (procedure 2)
Could you kindly explain why this is the case? Thank you
From: | Andrew G |
When: | 11 Oct 19 at 10:33 |
Hi,
From a brief look at the wsj site it appears as though the tables are being generated by javascript. The javascript code is executed by a web browser, in this case to dynamically produce the tables and charts on the page. When you don't use the browser method, the javascript code doesn't run and so you return only the basic response text. Here's a more indepth explanation which you may find interesting https://stackoverflow.com/questions/5317190/how-do-browsers-execute-javascript
I hope that helps!
From: | dragoom_doc |
When: | 09 Mar 19 at 22:41 |
I have a problem with the code. The website I am trying to use is a ".jsp" site and there is no data to parse using HTMLDoc properties.
From: | Bumana |
When: | 22 Feb 19 at 03:17 |
Great video - just one thing ever since I went from windows 7 to windows ten I can’t figure out why I am getting a run object error- even when following your basic example of entering document object model in Wikipedia..... run into this yet?
From: | Andrew G |
When: | 25 Feb 19 at 07:10 |
I haven't encountered an error with Internet Explorer that's specifically related to Windows 10. What's the exact message you receive and which line of code causes it?
From: | Neil J |
When: | 07 Oct 18 at 07:18 |
I am very much enjoying your excellent video (Part 47 on scraping websites) but have encountered a problem.
I have referenced the workbook to Microsoft Internet Controls and Microsoft HTML Object Library.
I can compile the following code without a proble. It opens the web page but fails on the final line with a Runtime 13 mismatch error:
Sub GetHTMLDocument()
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
IE.Visible = True
IE.navigate "wiseowl.co.uk"
Do While IE.ReadyState <> READYSTATE_COMPLETE
Loop
Set HTMLDoc = IE.Document
End Sub
I can't for the life of me see how to solve this.
Your help would be much appreciated
Thanks
Neil
From: | Andrew G |
When: | 07 Oct 18 at 19:47 |
Hi Neil,
I copy-pasted your code into a new workbook, set references to the two libraries you mentioned and executed the procedure with no errors. The only things I can suggest are providing a more complete URL:
IE.navigate "https://www.wiseowl.co.uk"
You may also consider adding a further delay to allow the page to load before attempting to capture a reference to it using this code:
Application.Wait Now + TimeValue("00:00:02")
You can adjust the delay as required. I hope that helps!
From: | Neil J |
When: | 10 Oct 18 at 23:27 |
Andrew
I did as suggested (removed any reference to Microsoft Internet Controls and ran your code).
It errored at "Set HTMLDoc = IE.Document" with the same error message.
I'm afraid I have currently run out of ideas on this problem
Sorry for delay in reply.
Regards
Neil
From: | Andrew G |
When: | 11 Oct 18 at 07:27 |
Hi Neil,
Thanks for the reply, that's really frustrating. I'll attempt to investigate further and get back to you if I discover anything which may be of use.
From: | Neil J |
When: | 08 Oct 18 at 10:46 |
Andrew
Many thanks for the response. I've tried both your suggestions, with the same results.
From what I can see, on my PC (running Windows 10 and Office 10) the conflict seems to be caused (I'm not experienced enough to understand why) in setting the following:
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
In the Watch window, HTMLDoc type is set as document with no value and IE.Document type is set as Object/HTMLDocument with a value of "Object HTMLDocument"
Separately, I note that the priority of references can also impact on code; references priority on this workbook is set as:
Visual Basic For Applications
Microsoft Excel 14.0 Object Library
OLE Automation
Microsoft Office 14.0 Object Library
Microsoft Internet Controls
Microsoft HTML Object Library
Microsoft Forms 2.0 Object Library
There are queries relating to this problem on numerous webpages but I've yet to see a solution. I will do my best to get to the bottom of this. If you could possibly help I'd much appreciate it but would understand if you don't have the time.
Neil
From: | Andrew G |
When: | 08 Oct 18 at 13:17 |
Hi Neil,
The issue is less to do with time and more to do with the fact that I can't replicate the problem! One way we could elminate the object library reference as the issue is to use late-binding. Remove the reference to the Microsoft Internet Controls library from the project and alter your subroutine so that it looks like this:
Public Const READYSTATE_COMPLETE As Integer = 4
Sub GetHTMLDocument()
Dim IE As Object
Dim HTMLDoc As MSHTML.HTMLDocument
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.navigate "https://www.wiseowl.co.uk"
Do While IE.ReadyState <> READYSTATE_COMPLETE
Loop
Set HTMLDoc = IE.Document
Debug.Print HTMLDoc.Title
End Sub
The suggestion here is that the issue may be due to incompatibilities between 32 and 64 bit editions of the software. You may well encounter further issues even if the above code fixes the current one. Let me know what happens!
From: | RicardoM |
When: | 14 Sep 18 at 20:39 |
Thanks by the excellent videos, I have a problem, when I try to access an application of our company I receive the following message:
"Automation error"
"The object invoked has disconnected from its clients"
My procedure is the following
Sub Browse()
Dim IE As New SHDocVw.InternetExplorer
IE.Visible = True
IE.Navigate "www.idis.ups.com/login.asp?SystemID=UPSDIS"
Do While IE.ReadyState <> READYSTATE_COMPLETE
Loop
Debug.Print IE.LocationName, IE.LocationURL
IE.Document.forms("login").elements("userid").Value = "xxxxxx"
IE.Document.forms("login").elements("password").Value = "xxxxxx"
End Sub
Could you please, tell me what the problem is?
Thank you for your help
From: | Andrew G |
When: | 15 Sep 18 at 09:33 |
If it's an internal website you might try the following:
Dim ie As New SHDocVw.InternetExplorerMedium
There are other suggestions here https://www.vba-market.com/2017/03/04/the-object-invoked-has-disconnected-from-its-clients/
I hope it helps!
From: | RicardoM |
When: | 17 Sep 18 at 05:02 |
Thanks Andrew for the information, with your suggestion I could solve the problem, I'm trying several methods to identify the "login" button, but I could not, Could you tell me how I refer to this button if I need to do click to enter this page?
Thanks
From: | Andrew G |
When: | 17 Sep 18 at 07:14 |
Unfortunately I can't access the page that you've linked to so I can't offer any other suggestions than those shown in the video.
From: | tony95 |
When: | 18 Jan 18 at 17:48 |
Hi Andrew
I have had great success scraping websites after studying your excellent videos but i have come across a line where I cannot get the element either by TagName or ClassName. I want to get "50" from the following:
<td class="ui-table__cell">50</td>
Any ideas?
Best regards
Tony95
From: | Andrew G |
When: | 19 Jan 18 at 08:16 |
Hi Tony,
I'm happy to hear that you've been having success with web scraping so far! I'm not quite sure why that particular element isn't working for you. Just to check, it looks as though you have two underscores in "ui-table__cell", was that intentional?
If you're having trouble getting at the specific element that you're interested in, perhaps you try looping through all the <td> tags as shown at ~ 1:04:45. You may end up with a lot of information that you don't need but you could always add a bit of code to tidy up at the end.
I hope some of that helps!
From: | tony95 |
When: | 19 Jan 18 at 14:06 |
Hi Andrew
Thanks for your prompt reply. I have tried:-
Set HTMLTables = HTMLPage.getElementsByTagName("td")
Debug.Print HTMLTables.Length
The return in the immediate window is 0
Greyed out at the end of the line <td class="ui-table__cell">50</td> is ==$0
Would that be the problem?
Best regards
Tony95
From: | ashishgarg |
When: | 15 Nov 17 at 12:18 |
Hi Andy, I learnt so much from your vba classes and want to thank you so much for such great videos. I need your help to sove one of the issue is that i am running loop to open multiple pages on differebt tabs in internet explorer using vba and checking 2 option on website but after i navigate to second page it still again checking the option on first page please check the code given below for your reference.
On my excel sheet range("A2") = BOMDXB, range("B2") = 1-dec-17, range("C2") = 10-dec17
Sub mmtlink()
Dim IE As New SHDocVw.InternetExplorer
Dim st As String
Dim htmldoc As MSHTML.HTMLDocument
Dim all As MSHTML.IHTMLElementCollection
Dim one As MSHTML.IHTMLElement
Dim i As Long, d As Long, b As Long
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
d = Range("c2") - Range("b2") + 1
For i = 1 To d
st = Application.WorksheetFunction.Text(Range("b2") + i - 1, "ddmmmyyyy")
If i = 1 Then
IE.navigate "https://www.makemytrip.com/air/search?tripType=O&itinerary=" _
& Left(Range("A2"), 3) & "-" & Right(Range("A2"), 3) _
& "-D-" & st & "&paxType=A-1&cabinClass=E"
Else
IE.Navigate2 "https://www.makemytrip.com/air/search?tripType=O&itinerary=" _
& Left(Range("A2"), 3) & "-" & Right(Range("A2"), 3) _
& "-D-" & st & "&paxType=A-1&cabinClass=E", 2048
End If
Do While IE.Busy = True Or IE.readyState <> 4
Loop
Set htmldoc = IE.document
htmldoc.getElementById("Non stop").Click
Set all = htmldoc.getElementsByClassName("sortbytype")
st = "from " & Left(Range("A2"), 3) & " (early)"
For Each one In all
If one.getAttribute("Innertext") = st Then
one.Click
Exit For
End If
Next one
Next i
End Sub
From: | that_guy |
When: | 30 Jun 17 at 19:54 |
Hi
Taken from this web page:http://www.skysports.com/racing/racecards/beverley/27-06-2017/781785/watch-live-racing-and-bet-with-racing-uk-handicap
I have the following :
<p class="print-hide">
<span class="v5-txt-high">Dutch Artist (IRE) </span>
5-1 (9-7) Held up and always towards rear, never on terms, last of 11, 22 1/4l behind Dark Crystal (8-12) at Ayr 7f hcp (5) sft in Jun.
</p>
and can get either the name ("Dutch Artist") or the name & text in one cell , but cannot cannot split the two, ie, name in column A & text in column B
Can you help?
From: | Andrew G |
When: | 03 Jul 17 at 10:17 |
You could always read the entire contents into a cell and use some string functions to separate the two parts. Everything to the left of the first ( is the name and everything to the right is the text. Something along these lines:
Dim s As String, sName As String, sText As String
s = Range("A1").Value
sName = Left(s, InStr(s, "(") - 1)
sText = Mid(s, InStr(s, "("))
Debug.Print sName, sText
I hope that helps!
From: | axel |
When: | 28 Jun 17 at 00:20 |
Hi Andy, I was following your lecture 47 on web scraping. I am pretty certain the first time i ran it everthing worked fine however now i keep getting a runtime error 13 type mismatch on the following line :
Set HTMLDoc = IE.Document
I have tried both excel 2010 & 2016, and get the same problem. Here is the code:
Option Explicit
Sub BrowseTo()
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As New MSHTML.HTMLDocument
IE.Visible = True
IE.Navigate ("http://x-rates.com")
Do While IE.ReadyState <> READYSTATE_COMPLETE
Loop
Debug.Print IE.Document, HTMLDoc 'produces output: [object HTMLDocument] [object]
Set HTMLDoc = IE.Document
End Sub
In debug mode if I hover the mouse over IE.Document it displays:
<The remote server machine does not exist or is unavailable> usually in excel 2016 it displays <Automation error The interface is unknown> or the first message.
This problem is driving me nuts as the code is so simple, but just will not work. Any help would be greatly appreciated.
Thanks Axel
From: | Andrew G |
When: | 28 Jun 17 at 06:56 |
Hi Axel,
I'm unable to replicate the problem after copy-pasting your code in Excel 2010 and 2016. Microsoft has this to say about the error message - is this possibly a case of bad timing? Could either your network or the XRates website have been experiencing issues at the point you were testing your code?
From: | axel |
When: | 28 Jun 17 at 12:30 |
Hi Andrew, I apprecaite your prompt reply. I don't think that there is a problem with the website as i have tried a few different sites, and retired many times. Also the page appears in the exploere window that opens up, so i presume it should just be referencing something that has already been loaded. The example following this using XML HTTP requests seems to work fine. I thought it might be antivirus(kaspersky), but again the page is loaded by the internet explorer. Is it possible that it is something do with the enviroment.
From: | Andrew G |
When: | 29 Jun 17 at 08:07 |
Hi Axel, it sounds like you've done all the sensible things to eliminate the potential simple causes. I'm afraid that I'm at a loss - I simply can't replicate the issue that you're having even with a straight copy-paste of your code. Other than Googling the exact text of the message and trawling the various forums to find people who've had the same issue I'm not sure what to suggest.
From: | zimone |
When: | 25 May 17 at 15:15 |
I'm trying to get information on all pages of a website, but I can only get it from the first page. The pagination uses JavaScript links like this:
<a href="javascript:__doPostBack('ctl00$cphCenter$gvCompany','Page$1')"> 1 </a>
<a href="javascript:__doPostBack('ctl00$cphCenter$gvCompany','Page$1')"> 2 </a>
Could you please teach me how to click and follow these links to scape the information on the next page (57 pages in total)? Thank you!
From: | Doclocko |
When: | 20 Feb 23 at 00:20 |
Hi Zimone. I saw something you posted on 26 May 2017 where you described how you resolved an issue relating to javascript, VBA and doPostBack.
I have an issue that I think is quite similar to the one you had. I'm working in Google Chrome, using Selenium and Excel VBA to interact with websites. One website I'm scraping has a number of element Id's such as:
<a id="ucRequoteViewReportTop_lnkRequote" href="javascript:__doPostBack('ucRequoteViewReportTop$lnkRequote','')">Requote</a>
I can't get my VBA code to click the button that has the ID above, and wondered if you could help me find a way to do this please?
Thank you, Paul.
From: | Andy B |
When: | 25 May 17 at 20:33 |
I'm not sure if you'll be able to do this. The website was built using ASP.NET. If you click on one of the paging buttons, it will submit another request to the server, which will then build a new page of HTML and send this back to you. Have a look at the URL shown in the address bar of your browser. If this changes, you'll be ableo to run your VBA scraping code on this page also, but if it doesn't everything is being done at the client side, and your VBA code won't be able to get at the information on the different pages.
From: | zimone |
When: | 26 May 17 at 15:25 |
Thanks Andy for your reply, I found the solution for my example, I have to execute javascript code from VBA by calling execScript command:
i = 2
For i = 2 To 57
Call objIE.Document.parentWindow.execScript("javascript:__doPostBack('ctl00$cphCenter$gvCompany','Page$" & i & "');", "JavaScript")
Next i
I discovered that this code can only run without error if I do not use any comment. If I set any comment in this code, the execution will give me error 80020101: Automation error. I really don't understand why the code is affected by comments. :)
From: | Andy B |
When: | 26 May 17 at 15:47 |
Thanks for taking the time to post the reply - a really good solution, which I'm sure will help others in the same situation.
From: | fabiojoa |
When: | 29 Apr 17 at 02:09 |
Hello Andrew! Greetings from Rio, Brazil!
I perfectly rounded your macro and was very happy with the results. But I found a wall that I could not overcome.
How do I pull data from a table that returns fields surrounded by {{field name}}?
Do you have a solution for this case? Ever seen something like that?
From: | Andrew G |
When: | 02 May 17 at 08:50 |
Hi! I'm sorry but I'm not sure - which web page are you attempting to get your data from?
From: | tony95 |
When: | 11 Apr 17 at 14:46 |
Hi Andrew
I have been watching your videos for some time and must say thanks for the very
informative content. I am trying to adapt the code from part 47 but I only want to loop
over 1 table. I have tried adding (0) to the end of line "Set HTMLTables = HTMLPage.getElementsByTagName("table")
but it produces an error. Any ideas please?
Best regards
Tony95
From: | Andrew G |
When: | 11 Apr 17 at 21:44 |
Hi Tony,
When you add (0) to the end of the line you're changing the type returned by the instruction. The (0) indicates that you're returning a single item but the HTMLTables variable has a type of IHTMLElementCollection. Just change the type of variable to IHTMLElement (or use the HTMLTable variable instead), something like this:
Sub ProcessHTMLPage(HTMLPage As MSHTML.HTMLDocument)
Dim HTMLTable As MSHTML.IHTMLElement
Dim HTMLRow As MSHTML.IHTMLElement
Dim HTMLCell As MSHTML.IHTMLElement
Dim RowNum As Long, ColNum As Integer
Set HTMLTable = HTMLPage.getElementsByTagName("table")(0)
Worksheets.Add
Range("A1").Value = HTMLTable.className
Range("B1").Value = Now
RowNum = 2
For Each HTMLRow In HTMLTable.getElementsByTagName("tr")
ColNum = 1
For Each HTMLCell In HTMLRow.Children
Cells(RowNum, ColNum) = HTMLCell.innerText
ColNum = ColNum + 1
Next HTMLCell
RowNum = RowNum + 1
Next HTMLRow
End Sub
Kingsmoor House
Railway Street
GLOSSOP
SK13 2AA
Landmark Offices
99 Bishopsgate
LONDON
EC2M 3XD
Holiday Inn
25 Aytoun Street
MANCHESTER
M1 3AE
© Wise Owl Business Solutions Ltd 2024. All Rights Reserved.