【Python(Automation)】How to do web scraping using Selenium

English pages
スポンサーリンク

Hi, I’m Yuki (@engineerblog_Yu), a student engineer.

Are you interested in getting a scraping project in Python?

In this article, I would like to explain Selenium for those who want to do scraping in Python.

With Selenium, you can automatically open a browser and collect information on a site.

If you keep the code for operations that are used many times, the next time you want to perform the same operation, you can simply run the program and perform the operation in an instant, which is very convenient.

If you want to work on Python projects or shorten your workload with Python, this is a good book to take a look at.

Scraping in particular is rich in projects, so if you want to get a project in Python, this is a good place to start.

スポンサーリンク

Basic Scraping Flow

The basic flow of scraping is

1, Check the HTML information of the website from the verification.

2, Store the desired information as a list using find_elements_by~.

3, Make a table using Pandas

4, Output as a CSV file or Excel file

The following is a list of the steps.

import

from selenium import webdriver

How to open browser

This code will automatically open Google Chrome.

browser = webdriver.Chrome()

Open the web site

Put the URL of the website you want to open in the brackets of browser.get() and execute it.

browser.get('URL')

Some websites may prohibit scraping, so please confirm this on your own.

Obtaining the id of a website

elem = browser.find_element_by_id('id')

An id is an HTML tag on a website.

This may be difficult because it requires knowledge of HTML as well as knowledge of Python.

If you are using Google Chrome, you can view the HTML information of a website by right-clicking and selecting “Verify”.

Please type in the id tag that contains the information you want to extract from the HTML information as a code.

(If there are multiple id tags with the same name, the information of the first id tag will be stored in the elem.)

If you want to store the information of multiple id tags with the same name as an array, type

elems = browser.find.elements_by_id('id')

Let’s give it as follows.

I think we can make the array store as many id tags as elms[0],elms[1],,,,and so on.

Get the class of the website

If you want to get the class in the same way, give it this way.

elem = browser.find.element_by_class_name('class')
elems = browser.find.elements_by_class_name('class')

Output element as text

After storing the information in the elem as shown above, you can use the text method to output the text.

elem.text

If you execute this code, the text will be stored in the array named values in order.

You can use the append method to put the text into the array named values in order.

values = []

for elem in elems:
    value = elem.text
    values.append(value)

How to make a list into a table

import pandas as pd

df = pd.DataFrame()
df['value']=values

Output as CSV or Excel file

df.to_csv('output.csv',index=False)
df.to_excel('output.xsl',index=False)

At the end

In this article, we introduced Selenium, which is used for web scraping.

If you are interested, you can take a course on Udemy.

多彩な講座から自分に合った講座を探そう!

Thank you for reading!

コメント

タイトルとURLをコピーしました