|
|
|
|
|
The Web was originally built for human
consumption |
|
Web pages are machine-readable but not machine-understandable |
|
Example: A bibliography entry in HTML |
|
<UL> |
|
<LI> |
|
R. Goldman, J. McHugh, and J. Widom. |
|
<A
href="ftp://db.stanford.edu/pub/papers/xml.ps"> |
|
From Semistructured Data to XML: Migrating the
Lore Data Model and Query Language |
|
</A>. |
|
Proceedings of the 2nd International Workshop on
the Web and Databases (WebDB ‘99), Philadelphia, Pennsylvania, June 1999. |
|
</UL> |
|
|
|
|
|
“XML is like HTML, where you make up your own
tags.” |
|
Provides a uniform method for describing and
exchanging data using the HTTP protocol |
|
HTML enables a universal method for displaying
data |
|
Tag a word to be displayed in bold or italic |
|
XML provides a universal method for describing
data |
|
Declare data to be a retail price, a sales tax,
a book title, ... |
|
Data is made up of characters or unparsed
“entities” |
|
Subset of Standard Generalized Markup Language
(SGML) |
|
Defined by the World Wide Web Consortium (W3C) |
|
|
|
|
|
|
XML document |
|
Optional prolog |
|
One or more elements |
|
Optional miscellany |
|
Comments |
|
Processing instructions |
|
E.g., |
|
<?xml version="1.0"?> |
|
<greeting>Hello, world!</greeting> |
|
<!-- A simple XML document. --> |
|
|
|
|
|
|
|
Non-empty element |
|
Start tag |
|
Element name |
|
Optional attribute specifications |
|
Attribute name |
|
Quoted string |
|
Content |
|
Elements |
|
Character data |
|
Entity or character references |
|
Character data sections |
|
Processing instructions |
|
Comments |
|
End tag |
|
|
|
|
<Publication
URL="ftp://db.stanford.edu/pub/papers/xml.ps” |
|
Authors="RG JM JW"> |
|
<Title>From Semistructured Data ... Language</Title> |
|
<Published>Proceedings of the ... Databases</Published> |
|
<Location> |
|
<City>Philadelphia</City> |
|
<State>Pennsylvania</State> |
|
</Location> |
|
<Date> |
|
<Month>June</Month> |
|
<Year>1999</Year> |
|
</Date> |
|
</Publication> |
|
<Author ID="RG">R.
Goldman</Author> |
|
<Author ID="JM">J.
McHugh</Author> |
|
<Author ID="JW">J.
Widom</Author> |
|
|
|
|
<weather-report> |
|
<date>March 25, 1998</date> |
|
<time>08:00</time> |
|
<area> |
|
<city>Seattle</city> |
|
<state>WA</state> |
|
<region>West Coast</region> |
|
<country>USA</country> |
|
</area> |
|
<measurements> |
|
<skies>partly cloudy</skies> |
|
<temperature>46</temperature> |
|
... |
|
</measurements> |
|
</weather-report> |
|
|
|
|
<p xml:lang="en">The quick brown
fox jumps over the lazy dog.</p> |
|
<p xml:lang="en-GB">What colour
is it?</p> |
|
<p xml:lang="en-US">What color
is it?</p> |
|
<sp who="Faust" desc='leise'
xml:lang="de"> |
|
<l>Habe nun, ach! Philosophie,</l> |
|
<l>Juristerei, und Medizin</l> |
|
<l>und leider auch Theologie</l> |
|
<l>durchaus studiert mit heißem Bemüh'n.</l> |
|
</sp> |
|
|
|
|
|
|
Describes the syntax of a class of XML documents |
|
Which elements are present |
|
Structural relationships between the elements |
|
Contains or points to markup declarations |
|
Element type declarations |
|
Attribute-list declarations |
|
Entity declarations |
|
Notation declarations |
|
Examples |
|
<!DOCTYPE greeting SYSTEM "http://www.
..."> |
|
<!DOCTYPE greeting |
|
[<!ELEMENT greeting (#PCDATA)> ]> |
|
|
|
|
|
|
<!ELEMENT Name Content-Spec> |
|
Content spec |
|
EMPTY |
|
ANY |
|
Character data, optionally interspersed with
child elements |
|
Child elements |
|
Using a simple grammar governing the allowed
types of the child elements and the order in which they may appear |
|
Examples |
|
<!ELEMENT MEMO
(TO,FROM,SUBJECT,BODY,SIGN)> |
|
<!ELEMENT BODY (P+)> |
|
|
|
|
|
|
<!ATTLIST Name Attribute-Definition* > |
|
Attribute definition |
|
Name Attribute-Type Default-Declaration |
|
Attribute type |
|
String type |
|
A set of tokenized types |
|
Enumerated types |
|
Default declaration |
|
#REQUIRED |
|
#IMPLIED (no default value provided) |
|
Attribute value (character data) |
|
Examples |
|
<!ATTLIST MEMO importance (HIGH|MEDIUM|LOW)
"LOW"> |
|
<!ATTLIST SIGN signatureFile CDATA #IMPLIED |
|
email CDATA #REQUIRED> |
|
|
|
|
|
|
<!ENTITY Name Entity-Definition > |
|
Entity definition |
|
Entity value |
|
External ID |
|
SYSTEM URL |
|
PUBLIC Public-identifier URL |
|
(External-ID NDATA Name) |
|
|
|
|
<!ELEMENT MEMO (TO,FROM,SUBJECT,BODY,SIGN)> |
|
<!ATTLIST MEMO importance (HIGH|MEDIUM|LOW) "LOW"> |
|
<!ELEMENT TO (#PCDATA)> |
|
<!ELEMENT FROM (#PCDATA)> |
|
<!ELEMENT SUBJECT (#PCDATA)> |
|
<!ELEMENT BODY (P+)> |
|
<!ELEMENT P (#PCDATA)> |
|
<!ELEMENT SIGN (#PCDATA)> |
|
<!ATTLIST SIGN signatureFile CDATA #IMPLIED |
|
email CDATA #REQUIRED> |
|
|
|
|
<!DOCTYPE MEMO SYSTEM "http://www.
..."> |
|
<MEMO importance HIGH> |
|
<TO>Jones</TO> |
|
<FROM>S.Smith</FROM> |
|
<SUBJECT>Project Plan</SUBJECT> |
|
<BODY> |
|
<P> … </P> |
|
<P> … </P> |
|
</BODY> |
|
<SIGN email SSMITH.CS.STANFORD.EDU> |
|
S.
Smith |
|
</SIGN> |
|
</MEMO> |
|
|
|
|
<!ELEMENT novel |
|
(preface,chapter+,biography?)> |
|
<!ELEMENT preface (paragraph+)> |
|
<!ELEMENT chapter
(title,paragraph+,section+)> |
|
<!ELEMENT section (title,paragraph+)> |
|
<!ELEMENT biography (title,paragraph+)> |
|
<!ELEMENT paragraph (#PCDATA|keyword)*> |
|
<!ELEMENT title (#PCDATA|keyword)*> |
|
<!ELEMENT keyword (#PCDATA)> |
|
|
|
|
|
|
URI (Universal Resource Identifier) |
|
The Web is an information space |
|
The URIs are the points in that space |
|
URI: name or address that refers to a resource |
|
URL (Uniform Resource Locator): URI that
includes explicit instructions on how to access the resource on the
internet |
|
XML namespace |
|
Set of names used as element types and attribute
names |
|
Identified by a URI |
|
Universally unique |
|
Qualified name |
|
A universally unique identifier |
|
Syntax: NamespaceName
‘:’ LocalPart |
|
|
|
|
|
A namespace is declared as an attribute
specification of attribute xmlns or an attribute whose prefix is xmlns: |
|
Example:
<x xmlns:edi = 'http://ecommerce.org/schema'> |
|
<!-- the "edi" prefix is bound to |
|
http://ecommerce.org/schema for the "x" |
|
element and contents --> |
|
</x> |
|
A declaration applies to the element where it is
specified and to all elements within the content of that element, unless
overridden by another namespace declaration with the same attribute name |
|
Example:
<!-- both namespace prefixes are available |
|
throughout --> |
|
<bk:book xmlns:bk = 'urn:loc.gov:books' |
|
xmlns:isbn = 'urn:ISBN:0-395-36341-6'> |
|
<bk:title> Cheaper by the Dozen </bk:title> |
|
<isbn:number> 1568491379 </isbn:number> |
|
</bk:book> |
|
|
|
|
|
Example of multiple namespace prefixes in an
element |
|
<bk:book xmlns:bk = 'urn:loc.gov:books' |
|
xmlns:isbn = 'urn:ISBN:0-395-36341-6'> |
|
<bk:title> Cheaper by the Dozen </bk:title> |
|
<isbn:number> 1568491379 </isbn:number> |
|
</bk:book> |
|
Example of default namespace in an element |
|
<book xmlns = 'urn:loc.gov:books' |
|
xmlns:isbn = 'urn:ISBN:0-395-36341-6'> |
|
<title> Cheaper by the Dozen </title> |
|
<isbn:number> 1568491379 </isbn:number> |
|
</book> |
|
|
|
|
|
“XML is like HTML, where you make up your own
tags.” |
|
Provides a uniform method for describing and
exchanging data using the HTTP protocol |
|
HTML enables a universal method for displaying
data |
|
Tag a word to be displayed in bold or italic |
|
XML provides a universal method for describing
data |
|
Declare data to be a retail price, a sales tax,
a book title, ... |
|
Data is made up of characters or unparsed
“entities” |
|
Provides a “syntactic schema” |
|
Provides no means of specifying semantics |
|