Web-scrapping, Web automation using Julia Language

This AI bot, take print screens, download documents from web browser. uses headless chrome web-driver for proxy browsing. GUMBO and Cascadia packages are used to crawl and read web contents.

2 min readMar 25, 2022

--

- Related Article: how to parse XML/JSON files and convert into ORM
- Related Article: how to Read Scanned Images and parse PDF files

First step is to download chrome web-driver. Please make sure, web-driver version matches with your chrome version.
(Open Chrome -> Help -> About chrome -> check version).
download appropriate version depending on machine OS and unzip/extract to a local folder.
Open a terminal window, browse to directory where web-driver is downloaded and start web-driver as

import Julia packages and create session — webdriver.jl

once, website is open, in your chrome browser window, right click on Input or Button you want to search or click on. make sure, element is selected then click on copy -> copy xpath

this is the xpath you want to pass in your Element method.

//*[@id="home-search-other-query"]

Web-scrapping

Gumbo.jl is a Julia wrapper around Google’s gumbo library for parsing HTML and Cascadia.jl package is used for automatic extraction of information from html pages.

Download data from web page

Web-scrapping, Web automation using Julia Language

This AI bot, take print screens, download documents from web browser. uses headless chrome web-driver for proxy browsing. GUMBO and Cascadia packages are used to crawl and read web contents.

Web-scrapping

Written by Amit Shukla

No responses yet