1, Introduction to xpath
xpath is an XML path language, which can be used to determine the location of elements in XML documents and find elements through element paths. HTML is an implementation of XML, so xpath is a very powerful positioning method.
1. Formula: / / tag name [@ attribute = 'value of attribute']
//*[@ id="kw"] -- relative path
/html/body/div[1]/div[2]/div[5]/div[1]/div/form/span[1]/input -- absolute path
expression | explain |
/ | Absolute positioning, selected from the root node |
// | Relative positioning: select the nodes in the document from the current node matched and selected, regardless of their location. It is more stable and concise |
. | Select current node |
.. | Select the parent node of the current node |
@ | Select the attribute, @ class='xxx 'or @ id='xxx' or @ name='xxx ', and put the attribute in brackets [] |
* | Wildcards. Match all//* |
@* | Wildcards. Match all attributes / / * [@ * ='WORD '] |
2, xpath location mode extension
from selenium import webdriver driver = webdriver.Chrome() driver.get("https://www.baidu.com") #Locate the input box through the xpath expression according to the id attribute driver.find_element("xpath","//input[@id='kw']") #Locate the input box through the xpath expression according to the class attribute driver.find_element("xpath","//input[@class='s_ipt']") #Locate the input box through the xpath expression according to the name attribute driver.find_element("xpath","//input[@name='wd']") #Combine and locate Baidu hypertext links through xpath driver.find_element("xpath","//a[@target='_blank'][@href='http://news.baidu.com']") driver.find_element("xpath","//a[@target='_blank' and @href='http://news.baidu.com']") #Locate Baidu hypertext link through xpath according to text driver.find_element("xpath","//a[text() = 'map'] ") #Locate some known attribute values through contians() driver.find_element("xpath","//a[contains(text(), 'smell')] ") #Get by index. The function of parentheses is to improve the operation priority driver.find_element("xpath","(//input)[1]") # Through the father to find the son / / div/span/input / --- indicates the level of the directory driver.find_element("xpath","//span[@id='s_kw_wrap']/input[@id='kw']") #Find offspring through ancestors driver.find_element("xpath","//form[@id='form']//input[@id='kw']") #Find your father through your son driver.find_element("xpath","//input[@id='kw']/..") #When other methods fail, axis operation is used at this time, which is the final trick #Looking for brother and sister driver.find_element("xpath","//input[@id='kw']/preceding-sibling::span") #Looking for brother and sister driver.find_element("xpath","//input[@id='kw']/following-sibling::span") # Find ancestors driver.find_element("xpath","//input[@id='kw']/ancestor::span")
3, Advantages and disadvantages of element positioning (not mentioned in the previous article, added here)
1. id and name:
1. Advantages: it is easy to locate elements. In most cases, its attribute value is unique. In the same page of the web page, the id is unique
2. Disadvantages: many elements do not have id and name attributes
2,class_name and tag_name:
1. Advantages: almost all elements have class_name and tag_name
2. Disadvantages: the values of class and tag are often not unique, so it is difficult to find an element accurately
3,link_text and partial_link_text:
1. Features: it can only be used for < a > labels
2. Difference:
1,link_text: used for some < a > tags with short names
2,partial_link_text: for some < a > tags, the name is very long. We can select some special words to locate
4. xpath and css_selector:
1. Features: xpath and CSS_ The selector can be used to locate almost all page elements and can be generated directly. However, sometimes the directly generated ones are not 100% effective. In some cases, you still need to write xpath and CSS manually_ selector
2. You can use any attribute to locate the element. You only need to add a pair of brackets on both sides of the attribute