Monday, July 4, 2011

xml parsing with java

In certain situations we java programmers may need to access information in XML documents using a java xml parser. This post will guide you to parse a xml document with java, using DOM parser and Xpath.

xmlfile.xml

Given XML document is quite lengthy, but do not worry because parsing a lengthy XML document is as easy as parsing a short XML document :) .

<a>
<e>
<hotels class="array">
   <e class="object">
    <addressLine1 type="string">4 Rue du Mont-Thabor</addressLine1>
    <amenities class="array">
     <e type="string">24</e>
     <e type="string">31</e>
     <e type="string">42</e>
     <e type="string">52</e>
     <e type="string">9</e>
    </amenities>
    <brandCode type="string">69</brandCode>
    <cachedPrice type="number">935</cachedPrice>
    <city type="string">Paris</city>
    <country type="string">US</country>
    <geoPoint class="array">
     <e type="number">48.86536</e>
     <e type="number">2.329584</e>
    </geoPoint>
    <hotelRateIndicator type="string">2</hotelRateIndicator>
    <id type="number">56263</id>
    <name type="string">Renaissance Paris Vendome Hotel</name>
    <neighborhood type="string" />
    <popularity type="number">837</popularity>
    <starRating type="string">5</starRating>
    <state type="string">IdF</state>
    <telephoneNumbers class="array">
     <e type="string" />
    </telephoneNumbers>
    <thumbnailUrl type="string">http://www.orbitz.com//public/hotelthumbnails/53/97/85397/85397_TBNL_1246535840051.jpg
    </thumbnailUrl>
    <total type="number">250</total>
    <ypid type="string">YN10001x300073304</ypid>
   </e>
   <e class="object">
    <addressLine1 type="string">39 Avenue de Wagram</addressLine1>
    <amenities class="array">
     <e type="string">24</e>
     <e type="string">31</e>
     <e type="string">42</e>
     <e type="string">9</e>
    </amenities>
    <brandCode type="string">69</brandCode>
    <cachedPrice type="number">633</cachedPrice>
    <city type="string">Paris</city>
    <country type="string">US</country>
    <geoPoint class="array">
     <e type="number">48.877106</e>
     <e type="number">2.297451</e>
    </geoPoint>
    <hotelRateIndicator type="string">3</hotelRateIndicator>
    <id type="number">112341</id>
    <name type="string">Renaissance Paris Arc de Triomphe Hotel</name>
    <neighborhood type="string" />
    <popularity type="number">796</popularity>
    <starRating type="string">5</starRating>
    <state type="string">IdF</state>
    <telephoneNumbers class="array">
     <e type="string" />
    </telephoneNumbers>
    <thumbnailUrl type="string">http://www.orbitz.com//public/hotelthumbnails/21/72/302172/302172_TBNL_1246535872514.jpg
    </thumbnailUrl>
    <total type="number">250</total>
    <ypid type="string">YN10001x300073331</ypid>
   </e>
   <e class="object">
    <addressLine1 type="string">35 Rue de Berri</addressLine1>
    <amenities class="array">
     <e type="string">24</e>
     <e type="string">31</e>
     <e type="string">42</e>
     <e type="string">9</e>
    </amenities>
    <brandCode type="string">82</brandCode>
    <cachedPrice type="number">706</cachedPrice>
    <city type="string">Paris</city>
    <country type="string">US</country>
    <geoPoint class="array">
     <e type="number">48.873684</e>
     <e type="number">2.306411</e>
    </geoPoint>
    <hotelRateIndicator type="string">3</hotelRateIndicator>
    <id type="number">108606</id>
    <name type="string">Crowne Plaza Hotel PARIS-CHAMPS ELYSÉES</name>
    <neighborhood type="string" />
    <popularity type="number">796</popularity>
    <starRating type="string">5</starRating>
    <state type="string">IdF</state>
    <telephoneNumbers class="array">
     <e type="string" />
    </telephoneNumbers>
    <thumbnailUrl type="string">http://www.orbitz.com//public/pegsimages/CP/thumb_PARAT.jpg
    </thumbnailUrl>
    <total type="number">250</total>
    <ypid type="string">YN10001x300161106</ypid>
   </e>
   </hotels>
 </e>
</a>   


XMLsample.java

In this code I have used DOM parser and xpath java library to query xml document. Here I am trying to extract hotel information[Hotel name, geo codes, star rating] from xmlfile.xml document

package com.eviac.blog;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;
import org.w3c.dom.NodeList;

public class XMLsample {

 public static void main(String[] args) {

  try {
   // loading the xml document into DOM Document object
   DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
   domFactory.setNamespaceAware(true);
   DocumentBuilder builder = domFactory.newDocumentBuilder();
   Document doc = builder.parse("xmlfile.xml");

   // XPath object using XPathFactory
   XPath xpath = XPathFactory.newInstance().newXPath();
   
   // XPath Query, compiling the path using the compile() method
   XPathExpression expr = xpath.compile("//hotels/e/name | //hotels/e/starRating | //hotels/e/geoPoint/e/text()");
   Object result = expr.evaluate(doc, XPathConstants.NODESET);
   NodeList nodes = (NodeList) result;
   for (int i = 0; i < nodes.getLength(); i++) {
    System.out.println(nodes.item(i).getTextContent());
   }
  } catch (Exception e) {
   e.printStackTrace();
  }
 }
}


Output
48.86536
2.329584
Renaissance Paris Vendome Hotel
5
48.877106
2.297451
Renaissance Paris Arc de Triomphe Hotel
5
48.873684
2.306411
Crowne Plaza Hotel PARIS-CHAMPS ELYSÉES
5

9 comments :

  1. Hi,

    Great post. One suggestion i would like to make is why not use Axiom rather than DOM as far as i know Axiom is a pull parser where the whole DOM is not built when requesting a particular node where as this is not the case with the DOM object. I believe Axis core uses Axiom to handle soap messages.

    Overall very well written article. Thx for sharing :)

    ReplyDelete
  2. Nice post. I was just doing a Groovy version (see below) and noticed there is a missing closing tag for "hotels".

    import javax.xml.parsers.*
    import javax.xml.xpath.*

    def builder = DocumentBuilderFactory.newInstance().newDocumentBuilder()
    def doc = builder.parse("xmlfile.xml")
    def xpath = XPathFactory.newInstance().newXPath()
    def query = "//hotels/e/name | //hotels/e/starRating | //hotels/e/geoPoint/e/text()"
    xpath.evaluate(query, doc, XPathConstants.NODESET).each{ println it.textContent }

    ReplyDelete
  3. Hi all,

    Thanks a lot for your valuable comments.

    BTW I corrected the xml, thanks for pointing it out :)

    ReplyDelete
  4. cool stuff :)) and great post too...

    ReplyDelete
  5. Wow...nice article...this was very very helpful for me...Thanks..

    ReplyDelete
  6. @Sameera I'm glad it helped you and thanks a lot for the comment :)

    ReplyDelete