Web-scrapping, Web automation using Julia Language

This AI bot, take print screens, download documents from web browser. uses headless chrome web-driver for proxy browsing. GUMBO and Cascadia packages are used to crawl and read web contents.

Amit Shukla
2 min readMar 25, 2022

- Related Article: how to parse XML/JSON files and convert into ORM
- Related Article: how to Read Scanned Images and parse PDF files

First step is to download chrome web-driver. Please make sure, web-driver version matches with your chrome version.
(Open Chrome -> Help -> About chrome -> check version).
download appropriate version depending on machine OS and unzip/extract to a local folder.
Open a terminal window, browse to directory where web-driver is downloaded and start web-driver as

./chromedriver — url-base=/wd/hub
import Julia packages and create session
webdriver.jl

once, website is open, in your chrome browser window, right click on Input or Button you want to search or click on. make sure, element is selected then click on copy -> copy xpath

this is the xpath you want to pass in your Element method.

//*[@id="home-search-other-query"]
headless browsing

Web-scrapping

Gumbo.jl is a Julia wrapper around Google’s gumbo library for parsing HTML and Cascadia.jl package is used for automatic extraction of information from html pages.

adding web scrapping packages

Download data from web page

webpage data download
browsing stack overflow webpage

--

--

Amit Shukla
Amit Shukla

Written by Amit Shukla

Build and Share Quality code for Desktop and Mobile App Software development using Web, AI, Machine Learning, Deep Learning algorithms. Learn, Share & Grow.

No responses yet