Self developed configuration table inspection tool

Posted by halojoy on Tue, 25 Jan 2022 20:33:27 +0100

preface:

Many project teams also have their own inspection methods. I have seen two different practices of the two project teams, but I don't agree with them. I comprehensively considered another approach (I think it's good, and I like it very much at present), but for some reasons, I didn't promote the realization of my idea in the original project.

This idea came into being two years ago, but I haven't had a chance to practice it. In April 2020, we finally had an opportunity to put our ideas into practice. Looking back now, I also feel a lot of emotion. Write down how I did it.

get ready:

Some points I need to consider:

Table configuration changes may be frequent, including but not limited to adding and deleting header, changing data format, etc
There may be multiple folders of tables with the same name (SLG has multiple seasons, there will be folders of season1 and season2 in different seasons, and there may also be tables with the same name under season)
When there are many configuration tables and rules, do you need to consider efficiency. I can accept the inspection results in a few minutes, but more than 10 is unacceptable.
It is simple, convenient and fast. It supports rule configuration and rule addition and deletion
Simple and clear output error problem
Do you want to consider platform
......

Try to achieve:

I don't know what to think. At that time, I had an idea to check the allocation table through rule table mapping (just to meet many of my needs). The specific measures are:

New rule check folder
Copy the table to be checked, delete the table content, and configure rules in the corresponding column
After the implementation of this practice, there are also some small problems to be continuously optimized, which will be mentioned later. Let's start with the realization of the core practice:

main function

if __name__ == '__main__':
    table = Table()
    //xlsx is used as the rule configuration table here because the automatic writing function is optimized later. xlsx is more convenient than xls
    excel = file.File.get_file_dir(config.RULE_DIR, file_type=".xlsx")
    d_excel = file.File.get_file_dir(config.DOC_DIR, file_type=".xls")
    dd_excel = [i+"x" for i in d_excel]
    print("Tables in the planning configuration table but not in the rule table (please supplement):", list(set(dd_excel).difference(set(excel))))
    print("Tables in the rule table but not in the planning configuration table (can be deleted):", list(set(excel).difference(set(dd_excel))))
    for every_excel in excel:
        # if every_excel !=  "\ winter secret table. xlsx":
        #     continue
        check_excel(every_excel)
check_excel Method core code
for head_name in head_list:
    if head_name == '':
        continue
    if head_name not in d_head_list:
        print(head_name, excel_name, "The header of the rule table does not exist in the header of the configuration table")
        continue
    d_col = d_head_list.index(head_name)
    col = head_list.index(head_name)
    for i in range(config.ROW_START, sheet.nrows):
        if sheet.cell_value(i, col) is "":
            continue
        cell = str(sheet.cell_value(i, col)).replace('\r', '').replace('\n', '').replace('\\', '\\\\')
        try:
            # value_dict = json.loads(cell)
            value_dict = json.loads(cell, strict=False)
        except ValueError:
            print("@@@@@@@@@@@@@@@@@@@Rule configuration error{}".format(cell), excel_name)
        (key, value), = value_dict.items()
        # Pass parameter: information about the checked table
        getattr(rule, config.switch[key])(value, excel_name=excel_name[:-1], sheet_name=sheet_name, col_index=d_col)

Rule definition and code implementation
I hope to classify the rules separately, so that it will be very easy to find my corresponding rules. How many class rules will I configure In config, such as In class rules, which generally means that column A has column B, and when A=X, column B has column C Such rules; Match rules include two rules: column A satisfies regular expression and column B satisfies regular expression when column A=X.

switch = {
    "_Only": "Only",
    "_In": "Ainb",
    "_Equal": "Equal",
    "_Match": "Match",
    "_Dict": "Dictionary",
    "_Condition": "Condition",
    "_Increasing": "Increasing",
}

Next, let me take the In class as an example to talk about my approach.

class Ainb(object):
    def __init__(self, *args, **kwargs):
        # **kwargs records information about the form being checked
        # r_wbk refers to the table corresponding to the rule table in the data configuration table (that is, the data table)
        # d_wbk refers to the table in the data configuration table that is pointed to by the rule configured in the corresponding rule table (that is, the table pointed to in the rule I configured)
        self.excel_name = kwargs["excel_name"]
        self.sheet_name = kwargs["sheet_name"]
        self.col_index = kwargs["col_index"]
        self.args = args
        self.r_wbk = table.open_workbook(config.DOC_DIR, self.excel_name)
        self.r_sheet = self.r_wbk.sheet_by_name(self.sheet_name)
        r_head = self.r_sheet.row_values(config.HEAD_CN)
        self.col_index_name = r_head[self.col_index]
        temp = {"All": "all_in",
                "Assign": "assign_in",
                "Part": "part_in",
                "Sequence": "sequence",
                "Whenassign": "whenassign",
                "Whenpoint": "whenpoint",
                "Every": "every_in",
                "Neighbor": "neighbor",
                "Allpoint": "allpoint"}
        if isinstance(self.args[0], dict):
            (key, value), = self.args[0].items()
            self.__getattribute__(temp[key])(value)

    # Each cell is a collection of the specified columns in the specified table
    # {"_In":{"All": ["/ winter's secret. xls", "winter's secret -- sheet", "winter's secret -- colname]}}
    def all_in(self, _value):
        d_wbk = table.open_workbook(config.DOC_DIR, _value[0])
        d_sheet = d_wbk.sheet_by_name(_value[1])
        d_head = d_sheet.row_values(config.HEAD)
        d_index = d_head.index(_value[2])
        d_col_list = table.get_content_by_col(
            config.DOC_DIR, _value[0], _value[1], d_index)
        d_col_list1 = [str(j) for j in d_col_list]
        for i in range(config.ROW_START, self.r_sheet.nrows):
            if not agent.pass_empty(i, self.r_sheet, self.col_index):
                continue
            temp = Common.change_format(
                self.r_sheet.cell_value(
                    i, self.col_index))
            if str(temp) not in d_col_list1:
                agent.pri_excel(
                    self.excel_name,
                    self.sheet_name,
                    i,
                    self.col_index_name)
                print("Rule: every cell in The set of specified columns in the specified table. data:::{}--{}". format(
                    str(self.r_sheet.cell_value(i, self.col_index)), _value))

According to the above code, in is a large class, and a large class includes a small class. In type rules, including All type rules (each cell has a set of specified columns in the specified table), Assign type rules (each cell exists in the set you set), and so on (other rules will not be described in detail). self.__getattribute__(temp[key])(value) will execute the corresponding method according to the mapping relationship to check. all_in method will print wrong rules and wrong data; pri_excel method is responsible for general printing, mainly printing the wrong position. An example of the print result is given below.

    @staticmethod
    def pri_excel(name, sheet, row='', col=''):
        if row == '' and col == '':
            print(
                "Table location----------------------------------{}-------------------------{}".format(name, sheet,))
        else:
            print(
                "Table location----------------------------------{}--------------------------{}--that 's ok{}--column{}".format(
                    name,
                    sheet,
                    row,
                    col))
//Rule: each cell is a collection of the specified columns in the specified table. Data::: 208 -- ['/ winter secret - table. xls',' winter secret - Sheet1 ',' winter secret - column ']
//Table location -------------------------------- \ XXX table Xls -------------------------------- Sheet1 -- row 712 -- column XXX (the secret of winter -- column name, note that it is not the index of the column, oh, column name, which directly allows you to find the position quickly)
//According to the above output, we can immediately know that 712 rows, the column name is the secret of winter, and the data content is 208. It should have existed in the secret table of winter, the secret sheet of winter, and the secret of winter in this column, but it does not exist, so we need to check the data of 208.

Overall idea:
Traversal rule table
According to the rules configured under the corresponding column of the rule table, find the corresponding configuration table to be checked and another configuration table (if any) in the rule that needs to be checked
Check according to the rules and print error output

Practical application and Optimization:

Good aspects:

At present, I have rule classes. As can be seen from the switch above, there are seven categories. There may be an indefinite number of rules under each category. Generally speaking, 99% meet my daily needs.
The code structure supports custom rules at any time. You can add or delete rules every minute. The rules can be as good as your idea.
The matching rules are simple. How simple is it? Find the location of the corresponding rule table, open and write the rules. It's done. Are the rules complicated? It depends on your definition. My daily practice: copy the rules, change the three parameters of excel name, sheet name and col name in the rules, and finish.
The output results are clear at a glance. What's wrong, which line, and what's the column name? Because what rules are wrong, there are all of them, and some people can't understand them? If I can't understand, then I'm a real Buddha.

Optimization:

Problem 1: the header is often changed, and the column name is also often changed. The header of the rule table has to be changed manually to ensure that it is the same as the configuration table (I am OCD, I want to be the same, so it is easy to find the location). For example, the ABCD column of the original configuration table and the rule table are also ABCD columns. Later, the table with multiple iterations becomes an ABCFG column. I have to manually delete D and manually supplement FG. Trouble. So the automatic synchronization header configuration script came into being.

What does this script mainly do? Help me delete D and create FG for me. If there is A small three between AB, column B (and all rules under column B) will be moved back automatically, and the small three will be placed after A in order. In short, it is to synchronize the header and move the rules under the corresponding header along with the header.

# Only xlsx is supported
def write_excel(excel, df):
    # Clear the format before writing it
    pd.io.formats.excel.header_style = None
    with pd.ExcelWriter(excel) as writer:
        df.to_excel(writer, sheet_name='Sheet1', startrow=0, index=False, header=False)
        workbook = writer.book
        worksheets = writer.sheets
        worksheet = worksheets['Sheet1']
        worksheet.set_column('A:AD', 20)
        format1 = workbook.add_format({
            # 'bold': True,  # Bold font
            # 'border': 20,  # Cell border width
            # 'align': 'left',  # Horizontal alignment
            # 'valign': 'vcenter',  # Vertical alignment
            'fg_color': '#FFC000',  # Cell background color
            # 'text_wrap': True,  # Auto wrap
        })
        for k in range(2, 5):
            worksheet.set_row(k, 20, format1)
        format2 = workbook.add_format({'fg_color': '#5B9BD5'})
        worksheet.set_row(1, 20, format2)
        for j in range(5, 15):
            worksheet.set_row(j, 20)

As like as two peas, one family is exactly the same, and the background is the same.

Question 2: platform?

I didn't plan or recommend it, so I cut this optimization. I have the following considerations:

People in the game industry will prefer to deal with excel

There are too many headers, the web is difficult to operate, and the style is not as good as excel. Excel is as long as the planning configuration table. It is easy to form habits and is conducive to rule configuration.

If it is a platform, it needs a database. Then we have to re optimize the problem (we have to write a script to automatically synchronize the database header)

I'm lazy. I don't want to build a platform or a front-end. Just lazy, what's the matter.

Problem 3: when the tool is running, the compatibility of some rules is poor, and an error is reported, which blocks the subsequent inspection.

For example, you configure a Dict class rule. By default, the table content can be converted to json or python dict for inspection, but there is a problem with the table data, resulting in error reporting and blocking the subsequent process (then someone will come to me and say error reporting). My current approach is to continuously optimize the rule code, increase compatibility, check the necessary data format in advance, or use try for error prone codes. Now, after continuous optimization, error reporting and blocking are rare. One word, beautiful (three words).

Question 4: efficiency?

At present, there are many rules configured for 100 + tables, but the running time does not need to exceed 1 minute. So leave it alone for the time being.

In fact, I also look forward to the efficiency of the tool in the case of more table data and rule data. I also foresee some optimization schemes in advance to improve efficiency.

Question 5: rule maintenance - will there be many tables for the same rule, and then you need to change the writing of these rules one by one?

I don't think it will happen. Before defining rules, generality has been considered. The writing method of the original rules will not be changed in the future, and only new rules will be added at most. Therefore, the old rules never need to be changed, and it is not recommended to change. If you really want to change, it is recommended to change the code to be compatible with the original rule writing. Anyway, I haven't changed it so far.

Others: I kind of want to sort out the rules taken out, then getattr a class, run the inspection of one class, and then do the next class, instead of mixed inspection now (in short, the inspection order follows the major classes). However, it's good to check the order and think about following the table. The output results can also follow the table order, which is also very good. I also want to join the collaborative process, but it involves the problem of open ing the same table. It needs to be locked. It's troublesome. Forget it. First, the result will be given in tens of seconds, and the efficiency is acceptable.

Postscript:

Different people have different opinions. I hope this tool will become a better configuration table inspection scheme. I also hope someone can tell me a better scheme.

Programmer Think