Read our blogs, tips and tutorials
Try our exercises or test your skills
Watch our tutorial videos or shorts
Take a self-paced course
Read our recent newsletters
License our courseware
Book expert consultancy
Buy our publications
Get help in using our site
547 attributed reviews in the last 3 years
Refreshingly small course sizes
Outstandingly good courseware
Whizzy online classrooms
Wise Owl trainers only (no freelancers)
Almost no cancellations
We have genuine integrity
We invoice after training
Review 30+ years of Wise Owl
View our top 100 clients
Search our website
We also send out useful tips in a monthly email newsletter ...
Many thanks for a great post which I've now successfully modified to scrape 2 sites I need data from. I can get the innertext but am struggling to get some additional data like an href giving the URL.
In the past, I have copied the HTML into a spreadsheet and parsed the html lines there to get at what I want. I now want to automate this hence my interest in your post. Two questions:
Downloading the HTML of one line so I can manually parse it
Using the element.innerText, I can extract the innerText from the following but what I want to do is to get the URL and each piece of innerText individually to put in fields in Excel:
<h3 class="title">
<a title="I need a freelance data entry &amp; admin " class="job js-paragraph-crop" data-height="65" href="https://www.peopleperhour.com/job/i-need-a-freelance-data-entry-admin-1619501">I need a freelance data entry & admin </a>
<span class="job-etiquettes">
<span class="etiquette orange">Urgent</span>
/span>
</h3>
<ul class="clearfix member-info horizontal crop">
<li class="hidden-phone">
<i class="fpph fpph-clock-wall"></i>
<span class="hidden-xs">Posted</span>
<time class="crop value" title="26 June 2017">3 hours ago</time>
</li>
<li class="hidden-xs job-location crop js-tooltip" title="The Job can be done remotely from any location">
<i class="fpph fpph-location"></i>
<span class="">Remote</span>
</li>
<li class="hidden-phone">
<i class="fa fa-dot-circle-o"></i>Proposals<span class="value proposal-count">19</span>
</li>
The innerText shows:
Title, Urgent, Time ago posted, Remote, # proposals, etc.
Is there a command I can use to get all the source html text from the "<a" line so I can parse it myself or can you tell me how to access the individual fields including the URL?
Downloading the Full HTML into Excel
A I mentioned, I can manipulate HTML once I paste it into Excel. If I cannot easily do the above, is there a way of pasting all the HTML from a webpage into Excel so I can use my manual methods?
Many thanks in advance for your help; I have spent many hours trawling the Internet for answers but couldn't find anything!
Nick
Kingsmoor House
Railway Street
GLOSSOP
SK13 2AA
Landmark Offices
99 Bishopsgate
LONDON
EC2M 3XD
Holiday Inn
25 Aytoun Street
MANCHESTER
M1 3AE
© Wise Owl Business Solutions Ltd 2024. All Rights Reserved.