[ Top page ]

« Perl database (tie) generation from text | Main | A program that "kicks" a UDP port using Perl »


Simplified XML interface using Perl -- conversion of table-style data representation between hash and XML

A good method for building exact XML interface is use XML API. However, it is easier to use Perl's pattern macth function to create a simpler XML (like) interface. In an exact XML interface, a text line may contain any number of tags, and a content enclosed by a beginning and ending tags. However, if we can add a restriction to this syntax, we can handle XML documents in an easier method. Such a simplified XML processing might not be encouraged, but it is convenient when building a simple prototype.

There are many APIs to handle XML, such as Sax or Xerces. It is rather easy to use such API as Sax, but still it takes some time and effort to remember the usage of the API and to link the library to the program. You may want to reduce the time and effort when building a simple prototype.

In such cases, I often use simple (pseudo) XML interface by using the pattern match function of Perl. Examples of such a requester and responder program fragments are shown below. In this program, the requester sends a request by using an HTML-like protocol called XXP. (The essense of this program is to restrict the data format to a table.)

A value of the table can be represented, for example, as follows.


The table has its type -- "table" is the type name. This means a table as follows.


However, the first line (col1, col2) represents the title (i.e., it is not part of the content).


make_xml() inputs an array that contains hashes and converts it to XML, and returns it ($result) as the function value. For example, you can input the following array to generate the XML text shown above.

[{col1 => value11, col2 => value12}, {col1 => value21, col2 => value22}]

send_reply($result) returns a responce message that contains an XML text by XXP. The request message part is omitted here, but it is a responce to an XXP-based request. The table type name is TAG here.

### make_xml(%hash)
sub make_xml(\%) {
    my ($group) = @_;
    my $result = "<TAGs>\r\n";
    foreach my $key (sort keys %group) {
	my $item = $group{$key};
	if (ref($item) eq 'HASH') {
	    $result .= "<TAG>\r\n";
	    foreach my $key (keys %$item) {
		$result .= "<${key}>$item->{$key}</${key}>\r\n";
	    $result .= "</TAG>\r\n";
    $result .= "</TAGs>\r\n";
    return $result;
# -- Whole message is created on memory.

### send_reply($result)
sub send_reply($) {
    my ($result) = @_;
    my $protocol = "XXP/1.0 ";
    if ($result < 0) {
	put($Client, $protocol . (400-$result) . " $error_message{-$result}\r\n");
	put($Client, "Content-length: 0\r\n\r\n");
    } else {
	if ($result) {
	    $result = "<?xml version=\"1.0\" encoding=\"shift_jis\"?>\r\n" .
	# Put XXP header
	put($Client, $protocol . "200 OK\r\n");
	put($Client, "Content-length: " . length($result) . "\r\n\r\n");

	# Put XXP body
	put($Client, $result);


The receiver first calls get_header($file_descriptor) to read the response header of XXP, and calls get_body($file_descriptor, $table_type, $body_length) to read the response body. It is assumed that the table type ($table_type) is decided by the request message in XXP. get_body() converts the XML text to hashes. This is the reverse function of make_xml().

### get_header($file)
#   get XXP reply message header
#   returns $body_length;
sub get_header($) {
    my ($file) = @_;
    my $line = <$file>;
    if ($line =~ / 200 /) {	# OK
	my $body_length = 0;
	while (($line = <$file>) !~ /^\s*$/) {	# Until empty line read
	    if ($line =~ /Content-length:\s+(\d+)\s/i) {
		$body_length = $1;		# Use content-length header only
	return $body_length;
    } else {	# Error
	return -1;

### get_body($file, $table_type, $body_length)
#   get XXP reply message body
sub get_body($$$) {
    my ($file, $table_type, $body_length) = @_;
    my $length = 0;
    my $table = [];
    my $record = {};
    while ($length < $body_length) {
	my $line = <$file>;
	$length += length($line);
	if ($line =~ /<\s*(\S+)\s*>(.*)<\s*\/\s*(\S+)\s*>/) {
	    my $stag = $1;
	    my $content = $2;
	    my $etag = $3;
	    if ($stag ne $etag) {
		system_error("Illegal tag syntax: <$stag$gt;...$lt;/$etag$gt;");
	    $record->{$stag} = $content;
	} elsif ($line =~ /<\s*${table_type}\s*>/) {		# record start tag
	    $record = {};		# create a new record
	} elsif ($line =~ /<\s*\/\s*${table_type}\s*>/) {	# record end tag
	    push(@$table, $record);	# push the record into the table
	} elsif ($line =~ /<\s*(\/\s*)?${table_type}s\s*>/) {	# collection tag
	    # ignore
	} elsif ($line =~ /<\s*\?\s*xml(.*)\?\s*>/) {	# first line
	    # ignore
	} elsif ($line =~ /\s*/) {
	    # ignore
	} else {
	    print STDERR "Illegal line: $line ($length)\n";
    return $table;

Warning: The above program is based on a tested program, but it was rewritten and was not yet tested again.



TrackBack URL for this entry:

Post a comment


This page contains a single entry from the blog posted on November 16, 2007 12:23 AM.

Many more can be found on the main index page or by looking through the archives.

Creative Commons License
This weblog is licensed under a Creative Commons License.
Powered by
Movable Type 3.36