xml文件解析

技术2025-11-06 19

xml解析

最近项目中涉及xml报文的解析，为此看了不少关于xml文件的解析，总结一下！

xml的解析方式有四种，这里只讲前3种：

1.DOM解析；

2.SAX解析；

3.DOM4J解析;

4.JDOM解析；

bookstore.xml(里面的内容随意编排，无需关注)

<?xml version="1.0" encoding="utf-8" ?> <bookstore> <book id="1"> <name>三国演义</name> <author>罗贯中</author> <year>1998</year> <price>59</price> </book> <book id="2"> <name>水浒传</name> <author>施耐庵</author> <year>1997</year> <price>46</price> </book> <book id="3"> <name>西游记</name> <author>吴承恩</author> <year>2013</year> <price>59</price> </book> <book id="4"> <name>红楼梦</name> <author>曹雪芹</author> <year>1996</year> <price>56</price> </book> </bookstore>

Book.java

public class Book { private String id; private String name; private String author; private String year; private double price; //getter/setter方法自行生成 }

1.DOM解析：

全称：Document Object Model，即文档对象模型。

基于DOM的XML分析器将一个XML文档转换成对象模型(即DOM树)；

优点：

树结构，便于理解与书写；解析过程，树结构保存在内存中方便改写。

缺点：

读取文件消耗内存大，因是直接一次性读取文件；当xml文件大的时候很容易导致内存溢出，不推荐使用。 // Dom解析xml path:自己目录下的xml文件 public static void domXmlParse(String path) throws ParserConfigurationException, IOException, SAXException { //创建factory对象 DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); //创建builder对象 DocumentBuilder builder = factory.newDocumentBuilder(); //通过DocumentBuilder对象的parse方法加载bookstore.xml到当前目录下 Document document = builder.parse(path); //获取book所有节点 NodeList bookList = document.getElementsByTagName("book"); //获取book节点个数 System.out.println("一共有" + bookList.getLength() + "本书"); for (int i = 0; i < bookList.getLength(); i++) { Node book = bookList.item(i);//获取每一本书即每个book节点 NamedNodeMap attributes = book.getAttributes();//获取book节点里的属性 System.out.println("第" + (i + 1) + "本书有" + attributes.getLength() + "个属性"); for (int j = 0; j < attributes.getLength(); j++) {//遍历book节点的属性值 Node node = attributes.item(j);//获取book节点的每个属性 System.out.println("属性名：" + node.getNodeName());//获取book节点的每个属性名 System.out.println("属性值：" + node.getNodeValue());//获取book节点的每个属性值 } NodeList childNodes = book.getChildNodes();//获取book节点下的子节点 System.out.println("第" + (i + 1) + "本书共有" + childNodes.getLength() + "个子节点"); for (int j = 0; j < childNodes.getLength(); j++) { //区分出text类型的node以及element类型的node if (childNodes.item(j).getNodeType() == Node.ELEMENT_NODE) { //获取element类型节点的节点名 System.out.println("第" + (j + 1) + "个节点的节点名：" + childNodes.item(j).getNodeName()); //获取element类型节点的节点值 System.out.println("--节点值是：" + childNodes.item(j).getFirstChild().getNodeValue()); } } System.out.println("-----------------结束遍历第" + (i + 1) + "本书"); } }

2.SAX解析：

全称：Simple APIs for XML，即XML简单应用程序接口。

与DOM不同的是，它是顺序模式，是一种快速读写xml数据的方式。

优点：

采取事件驱动模式，即当使用SAX分析器对XML文档进行分析时，会触发一系列事件，并激活相应的事件处理函数，应用程序通过这些事件处理函数实现对XML文档的访问;适用于只处理XML文件中的数据。

缺点：

编码复杂，不容易编写；很难同时访问XML文件中的不同数据。 // SAX解析xml public static void saxXmlParse(String path) { //获取factory对象 SAXParserFactory factory = SAXParserFactory.newInstance(); try { //获取parse对象 SAXParser parser = factory.newSAXParser(); //自定义SAXParserHandler类 SAXParserHandler handler = new SAXParserHandler(); //解析开始 parser.parse(path, handler); System.out.println("共有" + handler.getBookList().size() + "本书"); for (Book book : handler.getBookList()) { System.out.println(book.getAuthor()); System.out.println(book.getId()); System.out.println(book.getLanguage()); System.out.println(book.getName()); System.out.println(book.getPrice()); System.out.println(book.getYear()); System.out.println("----finish-----"); } } catch (ParserConfigurationException e) { e.printStackTrace(); } catch (SAXException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } }

SAXParserHandler.java

public class SAXParserHandler extends DefaultHandler { String value = null; Book book = null; private ArrayList<Book> bookList = new ArrayList<>(); public ArrayList<Book> getBookList() { return bookList; } int bookIndex = 0; // 用来标识解析开始 @Override public void startDocument() throws SAXException { super.startDocument(); System.out.println("SAX解析开始-----------"); } // 用来标识解析结束 @Override public void endDocument() throws SAXException { super.endDocument(); System.out.println("SAX解析结束-----------"); } // 解析xml元素 @Override public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { super.startElement(uri, localName, qName, attributes); if ("book".equals(qName)) { bookIndex++; //new一个book对象 book = new Book(); //开始解析book元素属性 System.out.println("开始遍历某本书---------------"); int num = attributes.getLength(); for (int i = 0; i < num; i++) { System.out.println("book元素的第" + (i + 1) + "个属性名是：" + attributes.getQName(i)); System.out.println("属性值是：" + attributes.getValue(i)); if ("id".equals(attributes.getQName(i))) { book.setId(attributes.getValue(i)); } } } else if (!"name".equals(qName) && "boostore".equals(qName)) { System.out.println("节点名是：" + qName + "---------"); } } @Override public void endElement(String uri, String localName, String qName) throws SAXException { super.endElement(uri, localName, qName); if ("book".equals(qName)) { bookList.add(book); book = null; System.out.println("结束遍历某本书----------------"); } else if ("name".equals(qName)) { book.setName(value); } else if ("author".equals(qName)) { book.setAuthor(value); } else if ("year".equals(qName)) { book.setYear(value); } else if ("price".equals(qName)) { book.setPrice(Double.parseDouble(value)); } else if ("language".equals(qName)) { book.setLanguage(value); } } @Override public void characters(char[] ch, int start, int length) throws SAXException { super.characters(ch, start, length); value = new String(ch, start, length); if (!"".equals(value.trim())) { System.out.println("节点值是：" + value); } } }

3.DOM4J解析：

特征：

1.JDOM的一种智能分支，它合并了许多超出基本XML文档表示的功能；

2.使用接口和抽象类方法；

3.具有性能优异、灵活性好、功能强大和极端易用的特点；

4.是一个开放源码的文件。

//Dom4j解析xml public static void dom4jXmlParse(String path) { List<Book> bookList = new ArrayList<>(); SAXReader reader = new SAXReader(); try { org.dom4j.Document document = reader.read(new File(path)); Element bookStore = document.getRootElement(); Iterator it = bookStore.elementIterator(); while (it.hasNext()) { System.out.println("--------开始遍历某一本书----------"); Element book = (Element) it.next(); List<Attribute> bookAttrs = book.attributes(); for (Attribute attr : bookAttrs) { System.out.println("属性名：" + attr.getName() + "，属性值：" + attr.getValue()); } Iterator itt = book.elementIterator(); while (itt.hasNext()) { Element bookChild = (Element) itt.next(); System.out.println("节点名：" + bookChild.getName() + ",节点值" + bookChild.getStringValue()); } System.out.println("--------结束遍历某一本书----------"); } } catch (DocumentException e) { e.printStackTrace(); } }

由上可知，最后一种是最值得推荐使用的，目前本人开发使用的XML数据解析就是使用的这个方法，其性能也是最好的，如果深究hibernate中是如何读取XML配置文件，你会发现其也是使用的DOM4J方法；

而相对于DOM解析来说，无疑是在10M以内的文件下性能比较好，超过10M的可能会出现内存溢出；不过对于DOM来说，它是使用于多种语言编程，如常见的JavaScript，获取Dom对象的时候比较常用；

SAX表现较好，这要依赖于它特定的解析方式－事件驱动。一个SAX检测即将到来的XML流，但并没有载入到内存（当然当XML流被读入时，会有部分文档暂时隐藏在内存中）。

Processed: 0.018, SQL: 9