Extract tables from two word s and merge them into one excel -- POI use

Posted by Moocat on Sun, 26 Dec 2021 07:46:44 +0100

1. Demand

There are two documents. Document A has A table with multiple items, which contains A large number of rows, but there is no "quantity" column in the table.
There are fewer rows in the table of document B, and there is more "quantity" column than in the table of document A.
The data in AB needs to be output to excel, including the column "quantity".

2. Ideas

Steps:

  1. Find tools or methods to extract tables from documents.
  2. Find tools or methods to write data into excel.
  3. Judge that the item name in the table of document A is the same as that in the table of document B, and output the item and its quantity value.

implement

1. Find tools or methods to extract tables from documents.

After multiple searches, Apache POI was found to provide read-write operations to office. Note 4.0 Version 1 and above only supports java version 8 and above.

For the method of extracting tables from word, find https://www.cnblogs.com/fanwenhao/p/11096596.html The supplied data appears to be in the Office 2007 + XML You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data
Because HWPFDocument is applicable to office 2003, the suffix is Document of doc (suffix is. doc, but it's not good if it's actually a docx document). Xwpfddocument is used for office 2007 and above.

For XWPFDocument, the following article describes it in detail: https://www.cnblogs.com/unruly/p/7479518.html
Therefore, according to the introduction in the text, I extracted the table code in word as follows:

 			FileInputStream in = new FileInputStream(filePath); // Load document
            XWPFDocument xwpf = new XWPFDocument(in);
            //Get all tables
            List<XWPFTable> tables = xwpf.getTables();
            System.out.println(tables);
            List<XWPFTableRow> rows;
            List<XWPFTableCell> cells;

            for (XWPFTable table : tables) {
                //Table properties
                CTTblPr pr = table.getCTTbl().getTblPr();
                //Get the row corresponding to the table
                rows = table.getRows();
                for (XWPFTableRow row : rows) {
                    //Get the cell corresponding to the row
                    cells = row.getTableCells();
                    for (XWPFTableCell cell : cells) {
                        System.out.println(cell.getText());
                    }
                }
            }

But in the experiment, I reported found interface org apache. poi. util. Pologger, but class was expected error. The search is caused by the inconsistency of dependent versions.

2. Find tools or methods to write data into excel.

It also uses the classes in POI, which can be referenced https://www.cnblogs.com/warrior4236/p/5263951.html . I replaced the HSSFWorkbook class with XSSFWorkbook to better adapt to the commonly used version of office. Combined with the above table extracted from word, the table is extracted from word and written into excel as follows:

        StringBuilder sb = new StringBuilder();
        FileOutputStream fos = null;
        try {
            FileInputStream in = new FileInputStream(filePath); // Load document
            XWPFDocument xwpf = new XWPFDocument(in);
            List<XWPFTable> tables = xwpf.getTables();
            System.out.println(tables);
            List<XWPFTableRow> rows;
            List<XWPFTableCell> cells;
            
            XSSFWorkbook workbook = new XSSFWorkbook();
            //Sheet sheet name
            XSSFSheet sheet = workbook.createSheet("Student information");

            for (XWPFTable table : tables) {
                //Table properties
                CTTblPr pr = table.getCTTbl().getTblPr();
                //Get the row corresponding to the table
                rows = table.getRows();
                for (XWPFTableRow row : rows) {
                    //Get the cell corresponding to the row
                    cells = row.getTableCells();
                    XSSFRow excelRow = sheet.createRow(rows.indexOf(row));
                    for (XWPFTableCell cell : cells) {
                        XSSFCell excelCell = excelRow.createCell(cells.indexOf(cell));
                        // 6: Write to cell
                        excelCell.setCellValue(cell.getText());
                    }
                }
            }
            //Output stream
            fos = new FileOutputStream("E:\\WordAndExcel\\test.xls");//output location
            workbook.write(fos);
            fos.flush();

        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            if (null != fos) {
                try {
                    fos.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }

3. Judge that the item name in the table of document A is the same as that in the table of document B, and output the item and its quantity value.

Idea:

  1. Create A class C whose attributes are all the columns of tables A and B.
  2. Then read all the data in table B and store them in a map, < item name, C >.
  3. Read the data in table A. if the names are the same, store the quantity in table a into the value of the key corresponding to the map.
  4. Traverse the map and store the data in the new excel for output.

4. There are problems.

When there are merged cells in the table in the document, there will be problems with the output data of the corresponding row, and no solution has been found yet.

Topics: Java