HTML5 Microdata Parser Library

by laurentm (modified: 2015 Aug 04)

This library is a part of the EiffelStudio package.


This library parses html content, and build MD_DOCUMENT tree that represents the microdata from the html5 content (cf itemscope, itemprop, ...)

Microdata is a WHATWG HTML specification used to nest semantics within existing content on web pages. Search engines, web crawlers, and browsers can extract and process Microdata from a web page and use it to provide a richer browsing experience for users. Search engines benefit greatly from direct access to this structured data because it allows search engines to understand the information on web pages and provide more relevant results to users. Microdata uses a supporting vocabulary to describe an item and name-value pairs to assign values to its properties. Microdata is an attempt to provide a simpler way of annotating HTML elements with machine-readable tags than the similar approaches of using RDFa and Microformats.

Notes: future evolutions would include url consuming (i.e get the content from the network). It may also provide a kind of command line browser, but this is just idea of future tasks. This is not yet a way to "generate" html5 microdata, but just to consume microdata