Extension of xpath location method for web automation

Posted by ctimmer on Tue, 08 Mar 2022 14:04:01 +0100

1, Introduction to xpath

xpath is an XML path language, which can be used to determine the location of elements in XML documents and find elements through element paths. HTML is an implementation of XML, so xpath is a very powerful positioning method.

1. Formula: / / tag name [@ attribute = 'value of attribute']

//*[@ id="kw"] -- relative path

/html/body/div[1]/div[2]/div[5]/div[1]/div/form/span[1]/input -- absolute path

expressionexplain
/Absolute positioning, selected from the root node
//Relative positioning: select the nodes in the document from the current node matched and selected, regardless of their location. It is more stable and concise
.Select current node
..Select the parent node of the current node
@Select the attribute, @ class='xxx 'or @ id='xxx' or @ name='xxx ', and put the attribute in brackets []
*Wildcards. Match all//*
@*Wildcards. Match all attributes / / * [@ * ='WORD ']

2, xpath location mode extension

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("https://www.baidu.com")

#Locate the input box through the xpath expression according to the id attribute
driver.find_element("xpath","//input[@id='kw']")

#Locate the input box through the xpath expression according to the class attribute
driver.find_element("xpath","//input[@class='s_ipt']")

#Locate the input box through the xpath expression according to the name attribute
driver.find_element("xpath","//input[@name='wd']")

#Combine and locate Baidu hypertext links through xpath
driver.find_element("xpath","//a[@target='_blank'][@href='http://news.baidu.com']")
driver.find_element("xpath","//a[@target='_blank' and @href='http://news.baidu.com']")

#Locate Baidu hypertext link through xpath according to text
driver.find_element("xpath","//a[text() = 'map'] ")

#Locate some known attribute values through contians()
driver.find_element("xpath","//a[contains(text(), 'smell')] ")

#Get by index. The function of parentheses is to improve the operation priority
driver.find_element("xpath","(//input)[1]")

# Through the father to find the son / / div/span/input / --- indicates the level of the directory
driver.find_element("xpath","//span[@id='s_kw_wrap']/input[@id='kw']")

#Find offspring through ancestors
driver.find_element("xpath","//form[@id='form']//input[@id='kw']")

#Find your father through your son
driver.find_element("xpath","//input[@id='kw']/..")

#When other methods fail, axis operation is used at this time, which is the final trick
#Looking for brother and sister
driver.find_element("xpath","//input[@id='kw']/preceding-sibling::span")
#Looking for brother and sister
driver.find_element("xpath","//input[@id='kw']/following-sibling::span")
# Find ancestors
driver.find_element("xpath","//input[@id='kw']/ancestor::span")

3, Advantages and disadvantages of element positioning (not mentioned in the previous article, added here)

1. id and name:

1. Advantages: it is easy to locate elements. In most cases, its attribute value is unique. In the same page of the web page, the id is unique

2. Disadvantages: many elements do not have id and name attributes

2,class_name and tag_name:

1. Advantages: almost all elements have class_name and tag_name

2. Disadvantages: the values of class and tag are often not unique, so it is difficult to find an element accurately

3,link_text and partial_link_text:

1. Features: it can only be used for < a > labels

2. Difference:

                        1,link_text: used for some < a > tags with short names

                        2,partial_link_text: for some < a > tags, the name is very long. We can select some special words to locate

4. xpath and css_selector:

1. Features: xpath and CSS_ The selector can be used to locate almost all page elements and can be generated directly. However, sometimes the directly generated ones are not 100% effective. In some cases, you still need to write xpath and CSS manually_ selector

2. You can use any attribute to locate the element. You only need to add a pair of brackets on both sides of the attribute

Topics: Python Selenium xml