1. Demand
There are two documents. Document A has A table with multiple items, which contains A large number of rows, but there is no "quantity" column in the table.
There are fewer rows in the table of document B, and there is more "quantity" column than in the table of document A.
The data in AB needs to be output to excel, including the column "quantity".
2. Ideas
Steps:
- Find tools or methods to extract tables from documents.
- Find tools or methods to write data into excel.
- Judge that the item name in the table of document A is the same as that in the table of document B, and output the item and its quantity value.
implement
1. Find tools or methods to extract tables from documents.
After multiple searches, Apache POI was found to provide read-write operations to office. Note 4.0 Version 1 and above only supports java version 8 and above.
For the method of extracting tables from word, find https://www.cnblogs.com/fanwenhao/p/11096596.html The supplied data appears to be in the Office 2007 + XML You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data
Because HWPFDocument is applicable to office 2003, the suffix is Document of doc (suffix is. doc, but it's not good if it's actually a docx document). Xwpfddocument is used for office 2007 and above.
For XWPFDocument, the following article describes it in detail: https://www.cnblogs.com/unruly/p/7479518.html
Therefore, according to the introduction in the text, I extracted the table code in word as follows:
FileInputStream in = new FileInputStream(filePath); // Load document XWPFDocument xwpf = new XWPFDocument(in); //Get all tables List<XWPFTable> tables = xwpf.getTables(); System.out.println(tables); List<XWPFTableRow> rows; List<XWPFTableCell> cells; for (XWPFTable table : tables) { //Table properties CTTblPr pr = table.getCTTbl().getTblPr(); //Get the row corresponding to the table rows = table.getRows(); for (XWPFTableRow row : rows) { //Get the cell corresponding to the row cells = row.getTableCells(); for (XWPFTableCell cell : cells) { System.out.println(cell.getText()); } } }
But in the experiment, I reported found interface org apache. poi. util. Pologger, but class was expected error. The search is caused by the inconsistency of dependent versions.
2. Find tools or methods to write data into excel.
It also uses the classes in POI, which can be referenced https://www.cnblogs.com/warrior4236/p/5263951.html . I replaced the HSSFWorkbook class with XSSFWorkbook to better adapt to the commonly used version of office. Combined with the above table extracted from word, the table is extracted from word and written into excel as follows:
StringBuilder sb = new StringBuilder(); FileOutputStream fos = null; try { FileInputStream in = new FileInputStream(filePath); // Load document XWPFDocument xwpf = new XWPFDocument(in); List<XWPFTable> tables = xwpf.getTables(); System.out.println(tables); List<XWPFTableRow> rows; List<XWPFTableCell> cells; XSSFWorkbook workbook = new XSSFWorkbook(); //Sheet sheet name XSSFSheet sheet = workbook.createSheet("Student information"); for (XWPFTable table : tables) { //Table properties CTTblPr pr = table.getCTTbl().getTblPr(); //Get the row corresponding to the table rows = table.getRows(); for (XWPFTableRow row : rows) { //Get the cell corresponding to the row cells = row.getTableCells(); XSSFRow excelRow = sheet.createRow(rows.indexOf(row)); for (XWPFTableCell cell : cells) { XSSFCell excelCell = excelRow.createCell(cells.indexOf(cell)); // 6: Write to cell excelCell.setCellValue(cell.getText()); } } } //Output stream fos = new FileOutputStream("E:\\WordAndExcel\\test.xls");//output location workbook.write(fos); fos.flush(); } catch (Exception e) { e.printStackTrace(); } finally { if (null != fos) { try { fos.close(); } catch (IOException e) { e.printStackTrace(); } } }
3. Judge that the item name in the table of document A is the same as that in the table of document B, and output the item and its quantity value.
Idea:
- Create A class C whose attributes are all the columns of tables A and B.
- Then read all the data in table B and store them in a map, < item name, C >.
- Read the data in table A. if the names are the same, store the quantity in table a into the value of the key corresponding to the map.
- Traverse the map and store the data in the new excel for output.
4. There are problems.
When there are merged cells in the table in the document, there will be problems with the output data of the corresponding row, and no solution has been found yet.