Visit our Sponsor   Visit our Sponsor
delphi3000.com - the free delphi knowledge platform
delphi3000.com - the free delphi knowledge platform
Have a look at your member-status

connecting people's knowledge


  - Recent ArticlesRSS feed for Recent Articles on delphi3000.com
  - List of All Articles
  - Top Viewed Articles
  - Articles (+Attachem.)
  - Articles Of Interest
  - Categories
  - Top Uploader
  - Search
  - Index

  - My Home
  - Submit an Article
  - My Articles
  - My Personal Data
  - My Bookmarks
  - Activities
  - Login/Logout

  - Sign Up
  - Why Sign Up
  - Newsletter

  - Press
  - Advertise

  - Contact
  - Feedback





Community
Borland
ClubeDelphi
Dr. Bob
UK-BUG
Delphi Meetings
Planeta Delphi







Startblatt.de






Share this article with friendsShare this article with friends
Rate this articleRate this article - to keep the quality of delphi3000.com !
Comment this article or read through previous comments (2)


Building an Easy-to-Use Parser/Parsing Framework (Part I)Component available for this articleFormat this article printer-friendly!Bookmark function is only available for registered users!
Introduction
Product:
Delphi all versions
Category:
OO-related
Skill Level:
Scoring:
Last Update:
09/26/2006
Search Keys:
delphi delphi3000 article borland vcl code-snippet xml,dtd,parser,parsing,framework
Times Scored:
7
Visits:
5865
Uploader: Marc Hoffmann
Company:
Reference: N/A
Component Download: http://downloads.kaju74.de/delphi3000/ParserFramework.zip
 
Question/Problem/Abstract:
How to create a simple parsing framework to parse any kind of data?
Answer:



A second article was released on 29.01.2002 with a more detailed example:

Building an
Easy-to-Use Parser/Parsing Framework (Part II)


Today, we wonna speak about "how to create a simple parser framework" in
Delphi. Our goal will be a class solutions which helps up to parse any
kind of data and store all relevant informations in an easy-to- access
object model.

At the end of this article, we've developed a small utility, which
generates a simple object model of a .dtd file and output it's xml pendant
from a given starting node. In other words, we're using the parsing
framework to create a parser, which is able to parse a .dtd file, extract
all neccessary tags, store them in the object model and generates the xml
output. Note: This utility uses a simply dtd- parser model, which don't
include all routines to parse all kinds of dtd datas - it's up to you to
include those features.

Our claims to the framework and object model are:

  - easy to use.
  - save/loadable object trees.
  - integrated error reporting.
  - expandable.

Okay, now let's start to develope the main parsing engine. Delphi comes
with a unit called CopyPrsr which includes the simple stream parser
object TCopyParser. Try to take a look into that file to understand
how it works - it's located under
$(DELPHI)\Source\Internet\CopyPrsr.pas. Our framework parser is
derived from that idea, but uses a simple string instead of the stream and
includes some additional functions:

The boiler plate:
=========================================================
unit StringParser;

interface

uses
  Classes;

const
{ Additional Parser special tokens }

  toEOL = char(6);
  toBOF = char(7);

type
{ TSysCharSet }

  TSysCharSet = set of Char;

{ TStringParser }

  TStringParser = class
  private
    { Private declarations }
    FParseString: string;
    FLineTokens: Integer;
    FSourceLine: Integer;
    FSourcePos: Integer;
    FTokenPos: Integer;
    FToken: Char;
    procedure SkipBlanks;
    function GetParseString: string;
    function GetSourcePos: Integer;
    function GetTokenString: string;
  protected
    { Protected declarations }
  public
    { Public declarations }
    constructor Create;
    function LoadFromFile(const FileName: string): Boolean;
    function LoadFromStream(const Stream: TStream): Boolean;
    function SkipToEOF: string;
    function SkipToEOL: string;
    function SkipToken: Char;
    function SkipTokenString: string;
    function SkipToToken(const AToken: Char): string; overload;
    function SkipToToken(const AToken: TSysCharSet): string; overload;
    function SkipToTokenString(const ATokenString: string): string;
    property ParseString: string read GetParseString;
    property SourceLine: Integer read FSourceLine;
    property SourcePos: Integer read GetSourcePos;
    property Token: Char read FToken;
    property TokenString: string read GetTokenString;
  end;


As you can see, there are many public helper functions which you can use
to parse the data. The main functions are LoadFromFile and
LoadFromStream, which needs the name of the file to be parsed as
the only parameter. Both functions loads the content of the file and store
it to the string FParseString which can be accessed through the
denominator property:

LoadFromFile/LoadFromStream:
=========================================================
function TStringParser.LoadFromFile(const FileName: string): Boolean;
var
  Stream: TMemoryStream;
begin
  Result := False;
  if not FileExists(FileName) then
    Exit;
  Stream := TMemoryStream.Create;
  try
    Stream.LoadFromFile(FileName);
    Result := LoadFromStream(Stream);
  finally
    Stream.Free;
  end;
end;

function TStringParser.LoadFromStream(const Stream: TStream): Boolean;
var
  MemStream: TMemoryStream;
begin
  Result := False;
  if not(assigned(Stream)) then
    Exit;
  MemStream := TMemoryStream.Create;
  try
    Stream.Seek(0, soFromBeginning);
    MemStream.CopyFrom(Stream, Stream.Size);
    FParseString := StrPas(MemStream.Memory);
    SetLength(FParseString, MemStream.Size);
    FParseString := Concat(FParseString, toEOF);
    FToken := toBOF;
    Result := True;
  finally
    MemStream.Free;
  end;
end;


The main functionality of the parsing engine is the extraction of so-
called tokens. A token can be a seperator (like CR, LF or EOF) or a
symbol, which can be a keyword if you plan to parse a programing language.
The following functions are used to skip blank characters (which are used
to seperate symbols and aren't relevant) and to extract/skip the next
token symbol:

Token related functions (pullout only):
=========================================================
procedure TStringParser.SkipBlanks;
begin
  while True do
  begin
    FToken := FParseString[FTokenPos];
    case FToken of
      #10:
        begin
          Inc(FSourceLine);
          FLineTokens := FTokenPos;
        end;
      toEOF, #33..#255:
        Exit;
    end;
    Inc(FTokenPos);
  end;
end;

function TStringParser.SkipToken: Char;
const
  KeySet = ['A'..'Z', 'a'..'z', '0'..'9', '_'];
begin
  SkipBlanks;
  FSourcePos := FTokenPos;
  if FParseString[FTokenPos] = toEOF then
    FToken := toEOF
  else
  if FParseString[FTokenPos] in KeySet then
  begin
    while FParseString[FTokenPos] in KeySet do
      Inc(FTokenPos);
    FToken := toSymbol;
  end
  else
  begin
    FToken := FParseString[FTokenPos];
    Inc(FTokenPos);
  end;
  Result := FToken;
end;

function TStringParser.SkipToToken(const AToken: TSysCharSet): string;
begin
  FSourcePos := FTokenPos;
  while not ((FParseString[FTokenPos] = toEOF) or (FParseString[FTokenPos] in AToken)) do
  begin
    if FParseString[FTokenPos] = #10 then
    begin
      Inc(FSourceLine);
      FLineTokens := FTokenPos;
    end;
    Inc(FTokenPos);
  end;
  if FParseString[FTokenPos] = toEOF then
    FToken := toEOF
  else
    FToken := FParseString[FTokenPos];
  Result := GetTokenString;
  if FToken <> toEOF then
    SkipToken;
end;


The absent functions includes alternativ possibilities to extract or skip
the tokens, like SkipToTokenString or SkipToEof. Well, the
next step is to create the object model, which holds all our parsed
informations. As I mentioned earlier, the goal of this article it to
create a simple dtd parser, so we'll create an object model to store dtd
informations.

A dtd file is used to descripe the syntax rules of a xml file, like:

DTD example:
=========================================================

key CDATA #REQUIRED
value CDATA #REQUIRED
>



This example demonstrated the simplest way of a dtd structure. It's not
the purpose of this article to develope a highly flexible dtd parser which
spots all dtd grammas, so we only include the weightly ones. Root of each
object model is the document, which holds all other objects as
collections:

The Root Document:
=========================================================
{ TDTDDocument }

  TDTDDocument = class(TPersistent)
  private
    { Private declarations }
    FEntities: TDTDEntities;
    FElements: TDTDElements;
    procedure SetEntities(Value: TDTDEntities);
    procedure SetElements(Value: TDTDElements);
  public
    { Public declarations }
    constructor Create;
    destructor Destroy; override;
    procedure Assign(Source: TPersistent); override;
  published
    { Published declarations }
    property Entities: TDTDEntities read FEntities write SetEntities;
    property Elements: TDTDElements read FElements write SetElements;
  end;


As you can see, our document gives us the access of some other types of
data: Entities and Elements. Entities are very hard to
parse, so it's a good lesson for you to include that feature. Parsing
elements is quite easier, so this type of data is better to explain here.
Look at the dtd example some rows above this. You can see, that a dtd
element is descripted as followed:



Our object model needs some extra fields to store such element
informations. If you are not familiar with dtd or xml, look at
W3CSchools - it's a good starting
point to learn more about that topic. So, take a look at the following
object structure:

TDTDDocument
  |
  o--TDTDEntities
  |
  o--TDTElements
    |
    o--TDTDElementTyp
    |
    o--TDTDAttributes
      |
      o--TDTDAttributeTyp
      o--TDTDAttributeStatus
      o--Default: string
      o--TDTDEnums
    o--TDTDChild
      |
      o--TDTDTyp
      o--TDTDChilds

I've tried to "pack" the dtd grammars into an easy object model as you can
see above:

  Each document contains a collection of elements. Each
  element is descripted by an elementtype and containes in turn a
  collection of attributes and childs. Each attribute again
  is descripted by an attributetype and contains a collection of
  enum(erations) and so forth. Followed a code cantle from the
  element implementation, it's not suggestive to show you the whole code
  here - it's quit long and a little bit more complex:

TDTDElement:
=========================================================
unit DTD_Document;

interface

uses
  Classes;

type

  ...

{ TDTDElementTyp }

  TDTDElementTyp =
    (etAny, etEmpty, etData, etContainer);

{ TDTDElementStatus }

  TDTDElementStatus =
    (esRequired, esRequiredSeq, esOptional, esOptionalSeq);

{ TDTDItem }

  TDTDItem = class(TCollectionItem)
  private
    { Private declarations }
    FName: string;
  public
    { Public declarations }
    procedure Assign(Source: TPersistent); override;
  published
    { Published declarations }
    property Name: string read FName write FName;
  end;

{ TDTDItems }

  TDTDItems = class(TCollection)
  private
    { Private declarations }
    function GetItem(Index: Integer): TDTDItem;
    procedure SetItem(Index: Integer; Value: TDTDItem);
  public
    { Public declarations }
    function Add: TDTDItem;
    function Find(const Name: string): TDTDItem;
    property Items[Index: Integer]: TDTDItem read GetItem write SetItem; default;
  end;

  ...

{ TDTDElement }

  TDTDElement = class(TDTDProperty)
  private
    { Private declarations }
    FTyp: TDTDElementTyp;
    FAttributes: TDTDAttributes;
    FChilds: TDTDChilds;
    procedure SetAttributes(Value: TDTDAttributes);
    procedure SetChilds(Value: TDTDChilds);
  public
    { Public declarations }
    constructor Create(Collection: TCollection); override;
    destructor Destroy; override;
    procedure Assign(Source: TPersistent); override;
  published
    { Published declarations }
    property Typ: TDTDElementTyp read FTyp write FTyp;
    property Attributes: TDTDAttributes read FAttributes write SetAttributes;
    property Childs: TDTDChilds read FChilds write SetChilds;
  end;

{ TDTDElements }

  TDTDElements = class(TDTDProperties)
  private
    { Private declarations }
    function GetItem(Index: Integer): TDTDElement;
    procedure SetItem(Index: Integer; Value: TDTDElement);
  public
    { Public declarations }
    function Add: TDTDElement;
    function Find(const Name: string): TDTDElement;
    property Items[Index: Integer]: TDTDElement read GetItem write SetItem; default;
  end;

  ...

implementation

  ...

{ TDTDItem }

procedure TDTDItem.Assign(Source: TPersistent);
begin
  if Source is TDTDItem then
  begin
    Name := TDTDItem(Source).Name;
    Exit;
  end;
  inherited Assign(Source);
end;

{ TDTDItems }

function TDTDItems.Add: TDTDItem;
begin
  Result := TDTDItem(inherited Add);
end;

function TDTDItems.Find(const Name: string): TDTDItem;
var
  i: Integer;
begin
  Result := nil;
  for i := 0 to Count - 1 do
    if CompareStr(Items[i].Name, Name) = 0 then
    begin
      Result := Items[i];
      Break;
  end;
end;

function TDTDItems.GetItem(Index: Integer): TDTDItem;
begin
  Result := TDTDItem(inherited GetItem(Index));
end;

procedure TDTDItems.SetItem(Index: Integer; Value: TDTDItem);
begin
  inherited SetItem(Index, Value);
end;

...

{ TDTDElement }

constructor TDTDElement.Create(Collection: TCollection);
begin
  inherited Create(Collection);
  FAttributes := TDTDAttributes.Create(TDTDAttribute);
  FChilds := TDTDChilds.Create(TDTDChild);
end;

destructor TDTDElement.Destroy;
begin
  FAttributes.Free;
  FChilds.Free;
  inherited Destroy;
end;

procedure TDTDElement.Assign(Source: TPersistent);
begin
  if Source is TDTDElement then
  begin
    Typ := TDTDElement(Source).Typ;
    Attributes.Assign(TDTDElement(Source).Attributes);
    Childs.Assign(TDTDElement(Source).Childs);
  end;
  inherited Assign(Source);
end;

procedure TDTDElement.SetAttributes(Value: TDTDAttributes);
begin
  FAttributes.Assign(Value);
end;

procedure TDTDElement.SetChilds(Value: TDTDChilds);
begin
  FChilds.Assign(Value);
end;

{ TDTDElements }

function TDTDElements.Add: TDTDElement;
begin
  Result := TDTDElement(inherited Add);
end;

function TDTDElements.Find(const Name: string): TDTDElement;
begin
  Result := TDTDElement(inherited Find(Name));
end;

function TDTDElements.GetItem(Index: Integer): TDTDElement;
begin
  Result := TDTDElement(inherited GetItem(Index));
end;

procedure TDTDElements.SetItem(Index: Integer; Value: TDTDElement);
begin
  inherited SetItem(Index, Value);
end;

...


The advantage of this object model is, that you're able to easily add the
document to a standard component and use Delphi's internal streaming
technology to load and save the object contents of a parsed file.

The next step will be the developing of the real dtd parser. Do you
remember the TStringParser descripted at the top of this article?
We'll using this class to build up our parser. But, we don't want to
create a parser from scratch each time we're about to parse a new kind of
data - it's not mind of a framework. So first, we'll develope a small
parser class from which we will inherit our dtd parser. This parent class
should include the error reporting and accessable fields to some other
informations:

The Private Parser class:
=========================================================
unit PrivateParser;

interface

uses
  Classes, SysUtils, StringParser;

type
{ TParserError }

  TParserError = class(TCollectionItem)
  private
    { Private declarations }
    FFileName: string;
    FLine: Integer;
    FMessage: string;
    FPosition: Integer;
  public
    { Public declarations }
    procedure Assign(Source: TPersistent); override;
  published
    { Published declarations }
    property FileName: string read FFileName write FFileName;
    property Line: Integer read FLine write FLine;
    property Message: string read FMessage write FMessage;
    property Position: Integer read FPosition write FPosition;
  end;

{ TParserErrors }

  TParserErrors = class(TCollection)
  private
    { Private declarations }
    function GetItem(Index: Integer): TParserError;
    procedure SetItem(Index: Integer; Value: TParserError);
  public
    { Public declarations }
    function Add: TParserError;
    property Items[Index: Integer]: TParserError read GetItem write SetItem; default;
  end;

{ TValidationParser }

  TValidationParser = class
  private
    { Private declarations }
    FErrors: TParserErrors;
    procedure SetErrors(const Value: TParserErrors);
  public
    { Public declarations }
    constructor Create;
    destructor Destroy; override;
    procedure AddError(const AMessage: string; Parser: TStringParser; const AFileName: string = '');
    procedure AddErrorFmt(const AMessage: string; Params: array of const; Parser: TStringParser; const AFileName: string = '');
    property Errors: TParserErrors read FErrors write SetErrors;
  end;

implementation

{ TParserError }

procedure TParserError.Assign(Source: TPersistent);
begin
  if Source is TParserError then
  begin
    Line := TParserError(Source).Line;
    Message := TParserError(Source).Message;
    Position := TParserError(Source).Position;

    Exit;
  end;

  inherited Assign(Source);
end;

{ TParserErrors }

function TParserErrors.Add: TParserError;
begin
  Result := TParserError(inherited Add);
end;

function TParserErrors.GetItem(Index: Integer): TParserError;
begin
  Result := TParserError(inherited GetItem(Index));
end;

procedure TParserErrors.SetItem(Index: Integer; Value: TParserError);
begin
  inherited SetItem(Index, Value);
end;

{ TValidationParser }

constructor TValidationParser.Create;
begin
  inherited Create;
  FErrors := TParserErrors.Create(TParserError);
end;

destructor TValidationParser.Destroy;
begin
  FErrors.Free;
  inherited Destroy;
end;

procedure TValidationParser.SetErrors(const Value: TParserErrors);
begin
  FErrors.Assign(Value);
end;

procedure TValidationParser.AddErrorFmt(const AMessage: string; Params: array of const; Parser: TStringParser; const AFileName: string = '');
begin
  with FErrors.Add do
  begin
    FileName := AFileName;
    Line := Parser.SourceLine;
    Message := Format(AMessage, Params);
    Position := Parser.SourcePos;
  end;
end;

procedure TValidationParser.AddError(const AMessage: string; Parser: TStringParser; const AFileName: string = '');
begin
  AddErrorFmt(AMessage, [], Parser, AFileName);
end;

end.


Now we can start developing the real parser by inheriting it from the
TValidationParser. Again, I don't want to show you the whole
sourcecode here, so I pick up only the sapid one. Our dtd parser is a so-
called two-way parser, i.e. it uses the first pass to parse the
elements and the second pass to parse the attributes. This is useful,
because an attibute can refer to an element which is placed below it and
otherwise we'll get some unneeded errors. The main method of our parse is
Parse, which  needs the name of the file to be parsed as the first
parameter, and a pre-initialized TDTDDocument as the second
parameter. A sample call should looks like:

Sample Call:
=========================================================
// Create DTDDocument.
DTDDocument := TDTDDocument.Create;
try
  // Create DTDParser.
  DTDParser := TDTDParser.Create;
  try
    // Parse File.
    DTDParser.Parse(FileName, DTDDocument);

    // Display possible Errors.
    if DTDParser.Errors.Count > 0 then
    begin
      for i := 0 to DTDParser.Errors.Count - 1 do
      with DTDParser.Errors[i] do
        WriteLn(Format('Error in Line %d, Pos %d: %s...', [Line, Position, Message]));
      Exit;
    end;

    ...

  // Free DTDParser.
  finally
    DTDParser.Free;
  end;

// Free DTDDocument.
finally
  DTDDocument.Free;
end;


But now, let's take a look at some sourcecode lines of the parser
implementation. The first think we had to do is to inherited our parser
from the parent class:

Parser Implementation (Snippet):
=========================================================
type
{ EDTDParser }

  EDTDParser = class(Exception);

{ TDTDParser }

  TDTDParser = class(TValidationParser)
  private
    { Private declarations }
    procedure ParseElement(Parser: TStringParser; Document: TDTDDocument; const Pass: Integer);
    procedure ParseAttlist(Parser: TStringParser; Document: TDTDDocument);
    procedure ParseFile(const FileName: string; Document: TDTDDocument; const Pass: Integer = 0);
  public
    { Public declarations }
    procedure Parse(const FileName: string; var Document: TDTDDocument);
  end;


Afterwards we implement the Parse method which calls the internal
method ParseFile on her part:

Method "Parse":
=========================================================
procedure TDTDParser.Parse(const FileName: string; var Document: TDTDDocument);
var
  TmpDocument: TDTDDocument;
begin
  if not assigned(Document) then
    raise EDTDParser.Create('Document not assigned!');
  TmpDocument := TDTDDocument.Create;
  try
    ParseFile(FileName, TmpDocument);
    if Errors.Count = 0 then
      Document.Assign(TmpDocument);
  finally
    TmpDocument.Free;
  end;
end;


As you can see, we create a special temporar document to store the parsed
objects in. I've done this because I don't want to return the document if
it is full of errors - I assign a exact copy of the objects only, if no
errors occured. The method ParseFile implements the proper parsing
calls to the StringParser and creates the real objects. Followed a code
snippet of the method body:

Method "ParseFile":
=========================================================
procedure TDTDParser.ParseFile(const FileName: string;
  Document: TDTDDocument; const Pass: Integer = 0);
var
  Parser: TStringParser;
begin
  Parser := TStringParser.Create;
  try
    if not Parser.LoadFromFile(FileName) then
    begin
      AddErrorFmt('File "%s" not found', [FileName], Parser);
      Exit;
    end;
    while True do
    begin
      while not (Parser.Token in [toEOF, '<']) do
        Parser.SkipToken;
      if Parser.Token = toEOF then
        Break;
      Parser.SkipToken;
      if Parser.Token <> '!' then
      begin
        if not(Parser.Token in ['?']) and (Pass = 1) then
          AddError('InvalidToken', Parser);
        Continue;
      end;
      if Parser.SkipToken <> toSymbol then
      begin
        if (Parser.Token <> '-') and (Pass = 1) then
          AddError('InvalidToken', Parser);
        Continue;
      end;
      if UpperCase(Parser.TokenString) = 'ENTITY' then
        Continue;
      if UpperCase(Parser.TokenString) = 'ELEMENT' then
        ParseElement(Parser, Document, Pass)
      else
      if UpperCase(Parser.TokenString) = 'ATTLIST' then
      begin
        if Pass = 1 then
          ParseAttlist(Parser, Document);
      end
      else
      if Pass = 1 then
        AddErrorFmt('Invalid Symbol "%s"', [Parser.TokenString], Parser);
    end;
    if Pass = 0 then
      ParseFile(FileName, Document, 1);
  finally
    Parser.Free;
  end;
end;


This method calls some other functions (ParseElement and
ParseAttlist) which parses the internal structures of an element or
an attribute. Look at the whole sourceode to understand.

What's next??

Well, this article has shown you how easy it is to write a customizeable
parser which can parse any kind of data - it's up to you, how complex it
should be. The main benefit in using this kind of parsing is, that you
don't need to incorporate in complex systems like LexParser.

Continue reading my second article:
Building an Easy-to-Use Parser/Parsing Framework (Part II)


Thank you very much for you regard.
M. Hoffmann





Please rate this article!
Skill level:
BeginnerExpert

Useful:
No!Very!

Overall rating:
PoorExcellent



Comments to this article
Write a new comment
SourceForge
    Mike Heydon (Jan 30 2002 11:23AM)

Nice article. However there are missing snippets such as the code for "GetTokenString" etc., and I cannot seem to find any source downloads on the SourceForge link you have provided ?
Respond

RE: SourceForge
Marc Hoffmann (Jan 30 2002 12:19PM)

You're right! I've currently problems to check-in any files to the cvs tree of sourceforge. I don't know if it's a problem on the sourceforge site or on me. I'll include the files plus the sample as an component to this articel as soon as possible.

Greets,
Marc
Respond














 
Sign up to consume product discounts for Bronze memberships !

read more


  Visit our Sponsor

 

  Community Ad of
R. Lefter
 
   














 







     
  Copyright © 2000 - 2007 delphi3000.com - All rights reserved. Terms of use. || Privacy
delphi3000.com is a service by bluestep.com IT-Services GmbH (Vienna)