C Language Help, C Language Tutorials, C Language Programming, C Language Tricks
The DTD
Validations in XML
So far, we have only read an XML file, without catering to special cases,
wherein, either an entity has been used, or data has to be validated as per the
element. The XmlTextReader class is the most optimum choice for reading an XML
file, barring the cases where data has to be validated, or in cases where an
entity has to be replaced with a value. For such purposes, the
XmlValidatingReader class is more suited. This class is derived from XmlReader,
and it conducts three types of validations- DTD, XDR and XSD schema validations.
This class is used when the primary task is either to conduct data validations
or to resolve general entities or to provide support for default entities.
a.cs
using System;
using System.IO;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlValidatingReader r = null;
XmlParserContext p;
p = new XmlParserContext(null, null, "vijay", null, null, "<!ENTITY pr
'100'>","","", XmlSpace.None);
r = new XmlValidatingReader ("<vijay mukhi='great' price='Rs ≺'></vijay>",
XmlNodeType.Element, p);
r.ValidationType = ValidationType.None;
r.MoveToContent();
while (r.MoveToNextAttribute())
{
Console.WriteLine("{0} = {1}", r.Name, r.Value);
}
r.Close();
}
}
Output
mukhi = great
price = Rs 100
To create the object p of type XmlParserContext, the constructor with nine
parameters of XmlParserContext class is called. The nine parameters are as
follows:
• The first parameter refers to the NameTable type. It has a value of null.
• The second parameter refers to NamespaceManager type. It also has a value of
null.
• The third Parameter is the DocType, i.e. the root tag 'vijay'.
• The fourth parameter is the pubid for the external DTD file.
• The fifth parameter is the sysid for the external DTD file.
• The sixth parameter is the internal DTD, where an ENTITY declaration <!ENTITY
pr '100'> has been created. This simply states that the word 'pr' is preceded by
a '&' and followed by a semi-colon must be replaced with the string '100'.
• The seventh parameter in sequence is the location from where the fragment is
to be loaded, i.e. the base URI.
• The eighth parameter stands for the xml:lang scope.
• The ninth parameter stands for the xml:space scope.
The parameters to the constructor of XmlValidatingReader class are similar to
those of the XmlTextReader, which we had encountered earlier. This class is
derived from the XmlTextReader as well as the IXmlLineInfo interface.
There are five different values that a Validationtype can be initialized to:
1. The first is Auto, which validates only when the DTD or schema information is
found.
2. The second is DTD, which validates based on the instructions found in the DTD.
3. The third option, which creates an XML 1.0 non-validation parser, validates
the default attributes and resolves entities without using the DOCTYPE. Thus, if
the root tag is changed from 'vijay' to 'vijay1', no errors will be generated.
Placing the ValidationType statement within comments will generate the following
exception:
"Unhandled Exception: System.Xml.Schema.XmlSchemaException: The root element
name must match the DocType name. An error occurred at (1, 2)."
4. The fourth option is XSD, which validates as per the XSD schemas.
5. The fifth option is XDR, which validates as per the XDR schemas. In our
program we have set this property to a value of None.
Once the required properties are set, the MoveToContent function is used to move
to the first element, 'vijay'. The next function, MoveToNextAttribute returns a
value of True when there are attributes remaining to be read. Otherwise, it
returns a value of False. In our case, it is similar to the MoveToFirstElement
function.
The while loop repeats twice, since there are two attributes. The Name and Value
properties for the first attribute are displayed as 'mukhi' and 'great'. This is
very similar to what we have observed in the earlier program. The name for the
second attribute is displayed as 'price'. However, its value is not the same,
because it has an entity ≺. The XmlValidatingReader replaces the entity pr
with the string '100', prior to displaying the value. Therefore, the output is
displayed as 'price' and 'Rs. 100'.
a.cs
using System;
using System.IO;
using System.Xml;
using System.Xml.Schema;
class zzz
{
public static void Main()
{
XmlTextReader r = new XmlTextReader("b.xml");
XmlValidatingReader v = new XmlValidatingReader(r);
v.ValidationType = ValidationType.DTD;
v.ValidationEventHandler += new ValidationEventHandler (abc);
while(v.Read());
}
public static void abc(object s, ValidationEventArgs a)
{
Console.WriteLine("Severity:{0}", a.Severity);
Console.WriteLine("Message:{0}", a.Message);
}
}
b.xml
<?xml version="1.0" ?>
<!DOCTYPE vijay1 >
<vijay>
</vijay>
Output
Severity:Error
Message:The root element name must match the DocType name. An error occurred at
file:///c:/csharp/b.xml(3, 2).
Severity:Error
Message:The 'vijay' element is not declared. An error occurred at file:///c:/csharp/b.xml(3,
2).
In the above program, to begin with, an object r that looks like XmlTextReader
is created, and then, it is passed to the constructor of XmlValidatingReader,
while object v is being created. The ValidationType of the object v is modified
to DTD. The ValidationEventHandler event is set to the function abc, which gets
called whenever an error occurs. Under the aegis of the Read function, the
entire XML file is validated, using the while loop, and the function abc is
notified whenever an error is chanced upon.
In the function abc, the values contained in the properties - Severity and
Message, of the ValidationEventArgs parameter 'a', are printed. The Severity
property reveals whether it is an error or warning, whereas, the Message
property contains the precise text of the error or warning.
In the above case, an error is generated because the DOCTYPE expects the root
element to be 'vijay1', whereas, it has been specified as 'vijay'. When no error
message is displayed, it may be inferred that no errors have been found.
The DTD
Using the above C# program, we shall now create our own DTD file. Therefore, we
shall modify only the b.xml and b.dtd files.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE vijay SYSTEM "b.dtd" >
<vijay />
b.dtd
<!ELEMENT vijay >
A DTD is generally very protracted. So, an internal DTD is rarely used. If it is
used, its contents have to be placed within [] brackets. To use an external DTD,
we use the words SYSTEM followed by the name of the DTD file, which is b.dtd, in
this case.
In b.dtd, an element 'vijay' is created by inserting the reserved characters
'<!', followed by ELEMENT, and finally by the element name 'vijay'. When we run
the C# program 'a', the following error is generated:
Output
Unhandled Exception: System.Xml.XmlException: This is an invalid content model.
Line 1, position 17.
An error in the DTD file has resulted in the generation of an un-handled
exception. The error occurred due to an incomplete ELEMENT statement.
b.dtd
<!ELEMENT vijay EMPTY>
The addition of the word EMPTY salvages the situation. By specifying the word
EMPTY, it is amply clear that the element named 'vijay' is an empty element.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE vijay SYSTEM "b.dtd" >
<vijay>
</vijay>
Output
Severity:Error
Message:Element 'vijay' has invalid child element '#PCDATA'. An error occurred
at file:///c:/csharp/b.xml(3, 8).
The DTD file states, with absolute clarity, that the ELEMENT 'vijay' is EMPTY.
However, an open tag <vijay> and a close tag </vijay>have been added to the XML
file. Therefore, an error message is generated, which, as usual, is
unintelligible.
Instead of using tags such as 'vijay', let us consider a DTD that has been
implemented in real life. This one is used for the WML, or the Wireless Markup
Language. The rules or syntax of WML are available as a DTD.
In our book titled 'WML and WMLScript', we have endeavoured to elucidate the
concept of a DTD. You are at liberty to refer to the book. However, we must
caution you that, the approach and the explanation used here is entirely at
variance with the one used in the earlier book.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
</wml>
b.dtd
<!ELEMENT wml EMPTY>
Output
Severity:Error
Message:Element 'wml' has invalid child element '#PCDATA'. An error occurred at
file:///C:/csharp/b.xml(3, 6).
The word 'vijay' has merely been replaced by the word 'wml'. The error generated
is akin to the earlier one. At this juncture, we introduce a 'card' into the DTD
file.
b.dtd
<!ELEMENT wml (card)>
Output
Severity:Error
Message:Element 'wml' has incomplete content. Expected 'card'. An error occurred
at file:///c:/csharp/b.xml(4, 3).
Every WML document must commence with the root tag 'wml'. In the DTD file, we
have placed the word 'card' within round brackets, along with wml. This
signifies that the wml tag must contain a tag or an element called 'card'. Since
there is no card in the XML file, an error is reported, stating that a card is
expected, and on account of its unavailability, the wml element is incomplete.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card />
</wml>
Output
Severity:Error
Message:The 'card' element is not declared. An error occurred at file:///c:/csharp/b.xml(4,
2)
We add the card tag as a single tag to our XML file, in an endeavour to
eliminate the error. But, as we have not specified 'card' as a valid element in
the DTD file, yet another error message is displayed. Unless 'card' appears as
an ELEMENT in the DTD file, it is not possible to use it in the XML file.
Therefore, we now include 'card' as an EMPTY element in b.dtd
b.dtd
<!ELEMENT wml (card)>
<!ELEMENT card EMPTY>
Now, all the errors just vanish. In the DTD file, we had affirmed that the
element 'card' shall be empty i.e. it will not have any content.
The XML file depicted below displays an error, because the 'card' tag is not a
single tag any longer.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card>
</card>
</wml>
Output
Severity:Error
Message:Element 'card' has invalid child element '#PCDATA'. An error occurred at
file:///C:/csharp/b.xml(4, 7).
The error message displayed here is very similar to the one seen with the wml
tag.
The element 'wml' has an invalid child element '#PCDATA'
A slight modification to the XML file is desirable, before we endeavour to
eliminate the error.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card>
hi
</card>
</wml>
Output
Severity:Error
Message:Element 'card' has invalid child element 'Text'. An error occurred at
file:///c:/csharp/b.xml(4, 7).
Inserting the word 'hi' between the card tags results in a slightly altered
error messages. In place of PCDATA, we get to see Text. Resorting to the
following modifications to the DTD file, both the error messages can be
eliminated.
b.dtd
<!ELEMENT wml (card)>
<!ELEMENT card (#PCDATA)>
To eradicate the errors, the EMPTY word is replaced with #PCDATA, enclosed
within round brackets. The word PCDATA is an acronym for Parseable Character
Data. In plain English, it represents text that can be entered from the
keyboard. Thus, we are at liberty to write as many lines of text as we want,
within the card tag. Even if the word 'hi' is removed from within the tags, no
error is generated.
Our DTD expects a root tag or starting tag of wml. Only a card tag can be
inserted amidst within this tag, which is capable of containing limitless
content. Insertion of anything else in this tag is a sure recipe for disaster.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card>
</card>
<card>
</card>
</wml>
Output
Severity:Error
Message:Element 'wml' has invalid content. Expected ''. An error occurred at
file:///c:/csharp/b.xml(6, 2).
The above error has occurred because, the DTD clearly specifies that the root
tag wml must have one, and only one, occurrence of the tag called 'card' within
it. Here, we have created two tags, thereby, causing the error.
b.dtd
<!ELEMENT wml (card)*>
<!ELEMENT card (#PCDATA)>
The * symbol, placed after the round brackets, is indicative of the fact that,
it can be replaced with zero to infinite values. Thus, the XML file can now
either have zero or countless card elements. If you do not give credence to this
statement of ours, you may either delete all the card elements from the XML
file, or add numerous cards. Either way, no error will be generated.
b.dtd
<!ELEMENT wml (card)+>
<!ELEMENT card (#PCDATA)>
Replacing the symbol * with a + transforms the meaning from 'zero to infinity'
to 'one to infinity'. The only difference between the * symbol and the + symbol
is that, the + sign mandates at least one occurrence of the element whereas, the
* signs makes it optional. Thus, in the aboveXMLfile, at least a single card
element is required.
b.dtd
<!ELEMENT wml (card)?>
<!ELEMENT card (#PCDATA)>
The last of the special characters is the symbol ? that specifies the number of
elements to be from 'zero or one'. Thus in the XML file, we may have either one
card element or none at all. The presence of two or more cards will generate an
error. You should try out various possible combinations for each of the symbols
*, + and?.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card>
<p> hi </p>
</card>
</wml>
b.dtd
<!ELEMENT wml (card)*>
<!ELEMENT card (p)>
<!ELEMENT p (#PCDATA)>
No error is generated because, in the DTD file, we have now stated that, the
card element can have a tag p, which can contain any text. We have, however,
done away with the provision of placing any text within the card tag.
Add in a new modification to the file.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card>
<p> <b/> </p>
</card>
</wml>
b.dtd
<!ELEMENT wml (card)*>
<!ELEMENT card (p)>
<!ELEMENT p (br | b)>
<!ELEMENT br EMPTY>
<!ELEMENT b EMPTY>
The DTD appears extensively complicated. The p tag is now competent of
containing only two tags, br and b. Text is not allowed any more. The | sign
signifies the OR condition, which implies that either tag b or tag br is
allowed. The two aforesaid tags are defined as EMPTY tags. To summarise, our DTD
states that the p tag can contain a single tag of either b or br.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card>
<p> <b/> <br/></p>
</card>
</wml>
Output
Severity:Error
Message:Element 'p' has invalid content. Expected ''. An error occurred at
file:///c:/csharp/b.xml(5, 11).
All is not well, because we are allowed to place either a 'b' or a 'br' at a
time, but not both together. To remedy the situation, we place a * symbol after
the p tag.
b.dtd
<!ELEMENT wml (card)*>
<!ELEMENT card (p)*>
<!ELEMENT p (br | b)*>
<!ELEMENT br EMPTY>
<!ELEMENT b EMPTY>
The above DTD provides us the flexibility of having multiple p tags within n
number of cards. These, in turn, may have as many b or br tags as desired.
By replacing the b tag with #PCDATA, a p tag is in a position to accommodate
multiple br tags, as well as an indefinite amount of text.
<!ELEMENT p (br | #PCDATA)*>
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card />
<head />
</wml>
b.dtd
<!ELEMENT wml (card,head)>
<!ELEMENT card EMPTY>
<!ELEMENT head EMPTY>
The above DTD file permits the wml tag to contain a card tag, which is then to
be strictly followed by a head tag. The comma signifies that one tag is to be
followed by the other. If we refrain from using the head tag in the XML file,
the following error message will be generated:
Output
Severity:Error
Message:Element 'wml' has incomplete content. Expected 'head'. An error occurred
at file:///C:/csharp/b.xml(5, 3).
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<head />
<card />
</wml>
Output
Severity:Error
Message:Element 'wml' has invalid content. Expected 'card'. An error occurred at
file:///c:/csharp/b.xml(4, 2).
If the order of the tags is interchanged, an error is thrown. The card tag must
be followed by the head tag. Besides, there is a restriction imposed that there
can be only one insertion of each tag. If there are multiple insertions, it will
result in an error.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card />
<card />
<head />
</wml>
b.dtd
<!ELEMENT wml (card+,head?)>
<!ELEMENT card EMPTY>
<!ELEMENT head EMPTY>
When the plus sign is inserted after the card, it allows the use of more that
one card tag in the file. The ? sign denotes 'zero or one' insertions of the
head tag. Thus, we can have more than one card tag and have either a single head
tag or none at all. If the head tag is present, it must be placed after the card
tag, since the order of the tags is sacrosanct.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card />
<head />
<card />
</wml>
Output
Severity:Error
Message:Element 'wml' has invalid content. Expected ''. An error occurred at
file:///c:/csharp/b.xml(6, 2).
The Draconian restrictions imposed by the DTD file prohibit us from altering the
sequence of the above tags. The card tag has to come first, followed by the head
tag. We cannot interchange a head tag with a card tag. So, the only solution to
this problem is to abide by the stipulated sequence.
b.dtd
<!ELEMENT wml (card+,head?,template*)*>
<!ELEMENT card EMPTY>
<!ELEMENT head EMPTY>
<!ELEMENT template EMPTY>
In the DTD file, we have added a * symbol to the entire set of tags, which make
up the wml element. The set consists of the following individual elements in a
sequential order:
• More than one card tags.
• Zero or one head tag.
• Zero to many template tags.
This set can constitute of numerous permutations and combinations of the above
conditions, in the specified order. Thus, the card and head can appear together,
or the card can appear by itself without the head tag, or the template tag may
not be present at all, and so on. Every occurrence, however, needs to begin with
a card tag.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card aa="hi"/>
</wml>
b.dtd
<!ELEMENT wml (card)>
<!ELEMENT card EMPTY>
<!ATTLIST card aa CDATA #IMPLIED>
In the above example, the card tag has an attribute called aa initialized to
'hi'. To implement an attribute, we include the word ATTLIST, which is a short
form for 'a list of attributes', in the DTD file. This is followed by the name
of the tag that the attribute is associated with. Then, the actual name of the
attribute aa is specified, followed by the datatype it will hold, which is
character data, in our case. The last parameter, #IMPLIED permits the attribute
aa to be optional. Therefore, even if you remove it, no error will be generated.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card />
</wml>
b.dtd
<!ELEMENT wml (card)>
<!ELEMENT card EMPTY>
<!ATTLIST card aa CDATA #IMPLIED bb CDATA #REQUIRED>
Output
Severity:Error
Message:The required attribute 'bb' is missing. An error occurred at file:///c:/csharp/b.xml(4,
2).
The error message clearly mentions that the attribute bb is missing. The
#REQUIRED demands the presence of attribute bb, along with the card, whenever
the card tag is used. Further, the attributes are to be placed one after the
other. However, the order of placement is not significant.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card bb="no"/>
</wml>
No errors are generated since the attribute bb, which is mandatory, has been
specified. You can avoid aa, since it is implied.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card aa="no"/>
</wml>
b.dtd
<!ELEMENT wml (card)>
<!ELEMENT card EMPTY>
<!ATTLIST card aa (hi | bye ) "bye">
Output
Severity:Error
Message:'no' is not in the enumeration list. An error occurred at file:///c:/csharp/b.xml(4,
7).
The values assigned to attributes can be restricted to specific values. This can
be achieved by specifying the values along with ATTLIST in the DTD file and
using the OR sign (|) as the separator. The attribute aa can only be assigned
the value of either 'hi' or 'bye'. Specifying any other value would result in an
error.
If the attribute is not initialized, it assumes the default value of 'bye'.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card aa="hi"/>
</wml>
The error disappears because the attribute has been assigned a value of 'hi'.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card aa="hi"/>
</wml>
b.dtd
<!ELEMENT wml (card)*>
<!ELEMENT card EMPTY>
<!ATTLIST card aa ID #IMPLIED>
We have created an attribute aa, with a data type of ID. This does not result in
any error.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card aa="hi"/>
<card aa="hi"/>
</wml>
Output
Severity:Error
Message:'hi' is already used as an ID. An error occurred at file:///c:/csharp/b.xml(5,
7).
The card tag can be used multiple times, due to the presence of the * sign in
the DTD file. By associating the type of ID to the attribute aa, it is
guaranteed that the same value of 'hi' is not assigned to the attribute. The
error message conveys that 'hi' has already been assigned as an ID to the
attribute aa, and hence, it cannot be used again.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card aa="hi"/>
<card aa="hi1"/>
</wml>
If we assign a different value to the attribute, the error is dispensed with.
Thus, a data type of ID guarantees that the attribute shall never have a
duplicate value.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card>
Hi &sonal;
</card>
</wml>
b.dtd
<!ELEMENT wml (card)*>
<!ELEMENT card (#PCDATA)*>
<!ENTITY sonal "hi" >
Entities have been touched upon earlier. Here, the word 'sonal' will be replaced
with 'hi'. This is called an Entity Reference. The DTD file requires an ENTITY
word with the variable 'sonal', and the value 'hi'.
Web solution, Websites help, Java help, C & C# Language help
Wednesday, December 26, 2007
C Language Help, C Language Tutorials, C Language Programming, C Language Tricks { The C# Language }
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment