奇怪的异常

Exception in thread "main" java.lang.NumberFormatException: For input string: "5155"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:580)
    at java.lang.Integer.parseInt(Integer.java:615)

5155很明显是个数字啊，为什么还会抛出异常？

代码如下：

    private static void parseLine() throws Exception{
        File storeFile = new File("STORE.txt");
        List<String> rows = FileUtils.readLines(storeFile);
        String firstRow = rows.get(0);
        String[] columns = firstRow.split("\\t");
        String idStr = columns[0];
        Integer id = Integer.valueOf(idStr);
        System.out.println(id);
    }

文档内容如下：

5155    据点01    001 5155店001机       
5310    据点02    001 5310店001机

从直觉和代码角度，都感觉没什么异常，然后调试一番

其他行

试了下其他行，发现是可以正常解析的。

 String firstRow = rows.get(3);
 或
 String firstRow = rows.get(5);

trim()

对字段进行得了trim操作

String firstRow = rows.get(0).trim();

依然报错。

总感觉第一行有什么不干净的东西

输出一下长度

String testStr = "5155";
String idStr = columns[0];
System.out.println("5155 len: " + testStr.length());
System.out.println("idStr len: " + idStr.length());

结果

5155 len: 4
idStr len: 5

果然是有了什么字符。

字符集

才发现文件的字符集是UTF-8-BOM的

BOM

什么是BOM？

BOM = Byte Order Mark

BOM是Unicode规范中推荐的标记字节顺序的方法。比如说对于UTF-16，如果接收者收到的BOM是EFF，表明这个字节流是Big-Endian的；如果收到FFFE，就表明这个字节流是Little-Endian的。

UTF-8不需要BOM来表明字节顺序，但可以用BOM来表明我是UTF-8编码。BOM的UTF-8编码是EF BB BF（用UltraEdit打开文本、切换到16进制可以看到）。所以如果接收者收到以EF BB BF开头的字节流，就知道这是UTF-8编码了。

解决

~~对第一行对BOM标志位进行replace操作就可以了~~
利用hutool的BOMInputStream