Web-scrapping, Web automation using Julia Language
This AI bot, take print screens, download documents from web browser. uses headless chrome web-driver for proxy browsing. GUMBO and Cascadia packages are used to crawl and read web contents.
- Related Article: how to parse XML/JSON files and convert into ORM
- Related Article: how to Read Scanned Images and parse PDF files
First step is to download chrome web-driver. Please make sure, web-driver version matches with your chrome version.
(Open Chrome -> Help -> About chrome -> check version).
download appropriate version depending on machine OS and unzip/extract to a local folder.
Open a terminal window, browse to directory where web-driver is downloaded and start web-driver as
once, website is open, in your chrome browser window, right click on Input or Button you want to search or click on. make sure, element is selected then click on copy -> copy xpath
this is the xpath you want to pass in your Element method.
//*[@id="home-search-other-query"]
Web-scrapping
Gumbo.jl is a Julia wrapper around Google’s gumbo library for parsing HTML and Cascadia.jl package is used for automatic extraction of information from html pages.
Download data from web page