Parsing csv file compatible bom header
background
Analytical compatibility
summary
background
Next Installing and configuring Sftp and accessing it through java , because the file we uploaded is a standard file csv format file generated by the program, and Party B summarizes the outbound call results through human flesh, creates a TXT file, and then modifies the suffix to become a csv file, which will lead to some problems in our program parsing, For example, the problem of bom file header (they are windows systems, and only when windows systems change txt to csv, there will be bom header problem), which leads to errors in our program parsing. Of course, as a programmer with moral character and pursuit, we will certainly not learn from them to parse in a meritorious way. Then, we will parse the csv file with bom header in a program compatible way.
Analytical compatibility
Introduce dependency
<dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-csv</artifactId> <version>1.5</version> </dependency>
1. Regular csv file parsing
List<T> resultList = new ArrayList<>(); BufferedReader bufferedReader = null; InputStreamReader inputStreamReader = null; ByteArrayInputStream byteArrayInputStream = null; CSVParser parser = null; try { byteArrayInputStream = new ByteArrayInputStream(bytes); inputStreamReader = new InputStreamReader(byteArrayInputStream); //reader = new UnicodeReader(byteArrayInputStream,"utf-8"); //bufferedReader = new BufferedReader(reader); bufferedReader = new BufferedReader(inputStreamReader); parser = CSVFormat.DEFAULT .withHeader("a","b") .withFirstRecordAsHeader() .parse(bufferedReader); //int rowIndex = 0; for (CSVRecord record : parser.getRecords()) { //transfer record to row T row = ... log.info("read data from ftp;row={}",row); resultList.add(row); } } catch (UnsupportedEncodingException e) { log.error("occur error;filePath={}",filePath,e); } catch (IOException e) { log.error("occur error;filePath={}",filePath,e); } catch (Exception e) { log.error("occur error;filePath={}",filePath,e); } finally { IOUtils.closeQuietly(byteArrayInputStream); IOUtils.closeQuietly(inputStreamReader); //IOUtils.closeQuietly(reader); IOUtils.closeQuietly(bufferedReader); IOUtils.closeQuietly(parser); }
In this case, there is no problem parsing regular csv files, but files with bom headers cannot be parsed. The reason is that csv is also a plain text file in theory. It is not ruled out that the generated txt file has become a csv file by changing the suffix name, or the csv manually generated on the windows platform has a bom header. When you open the file with a command, you will find that the file header is garbled.
data:image/s3,"s3://crabby-images/5f199/5f1994a05ea152fc536602e2005f7df01141323c" alt=""
2. Use bom stream to analyze compatibility
List<T> resultList = new ArrayList<>(); BufferedReader bufferedReader = null; InputStreamReader inputStreamReader = null; ByteArrayInputStream byteArrayInputStream = null; BOMInputStream bomInputStream = null; CSVParser parser = null; try { byteArrayInputStream = new ByteArrayInputStream(bytes); //Use BOMInputStream compatible bom header csv file bomInputStream = new BOMInputStream(byteArrayInputStream,false, ByteOrderMark.UTF_16LE, ByteOrderMark.UTF_16BE,ByteOrderMark.UTF_8); String charset = "UTF-8"; if(bomInputStream.hasBOM()) { charset = bomInputStream.getBOMCharsetName(); } inputStreamReader = new InputStreamReader(bomInputStream, Charset.forName(charset)); //reader = new UnicodeReader(byteArrayInputStream,"utf-8"); //bufferedReader = new BufferedReader(reader); bufferedReader = new BufferedReader(inputStreamReader); parser = CSVFormat.DEFAULT .withHeader("a","b") .withFirstRecordAsHeader() .parse(bufferedReader); //int rowIndex = 0; for (CSVRecord record : parser.getRecords()) { T row = ... log.info("read data from ftp;row={}",row); resultList.add(row); } } catch (UnsupportedEncodingException e) { log.error("occur error;filePath={}",filePath,e); } catch (IOException e) { log.error("occur error;filePath={}",filePath,e); } catch (Exception e) { log.error("occur error;filePath={}",filePath,e); } finally { IOUtils.closeQuietly(byteArrayInputStream); IOUtils.closeQuietly(bomInputStream); IOUtils.closeQuietly(inputStreamReader); IOUtils.closeQuietly(bufferedReader); IOUtils.closeQuietly(parser); }
The principle is that the bom header can be detected in the bom flow, and the bom is exclude d in the flow.
3. Use Unicode reader to parse compatibility
Similar to the above Codes:
UnicodeReader ur = new UnicodeReader(fis, "utf-8"); bufferedReader = new BufferedReader(ur);
Unicode reader realizes the automatic detection and filtering reading of BOM through PushbackInputStream+InputStreamReader; When no BOM is detected, the pushback stream will fallback and read it with the code passed in by the constructor. Otherwise, the code corresponding to BOM is used for reading.
summary
For 2 and 3 in the previous section, the 3 mode is relatively lighter and more powerful; In addition, it is also more transparent. You can modify the source code to meet your needs.
Unicode reader reference: http://akini.mbnet.fi/java/unicodereader/