[Date Prev] [Date Next] [Thread Prev] [Thread Next] Indexes: Main | Date | Thread | Author

[ba-unrev-talk] Fwd: [xml-dev] fwd: metaphorical Web



>From: "Simon St.Laurent" <simonstl@simonstl.com>
>To: xml-dev@lists.xml.org
>
>I don't normally forward newsletters to xml-dev, but this one has a very
>interesting report on Web Services and questions about things like
>binary representations of XML infosets.  The XSLT 2.0 piece that follows
>may also be interesting to people with no interest in the Don Box story.
>
>Thanks to Kurt Cagle, both for writing this and for saying it was okay
>to redistribute it.
>-----------------------------------------------------
>****************************
>Kurt Cagle's
>Metaphorical Web
>****************************
>Wednesday, October 16, 2002
>http://www.kurtcagle.net
>kurt@kurtcagle.net
>****************************
>
>==========================================
>Out of the Box
>Don Box and Microsoft's XML Architecture
>==========================================
>I had the pleasure last night of listening to Don Box, one of the
>principal architects of SOAP and, as of January of this year, the
>Program Director for Microsoft's XML Architecture Group. A tall,
>energetic man with a salt and pepper beard and owlish glasses, Don held
>the audience of developers at the Seattle Dot Net Users Group rap with
>his discussion of what is usually a deadly-dull topic -- technical
>standards.
>
>Don is in charge of the group within Microsoft that deals with the pipes
>and plumbing of Microsoft's .NET/web services strategies. He is, in
>essence, sitting at the very epicenter of the most profound changes that
>have taken place within the company since the heady days of the browser
>wars in 1995 through 1997. The roadmap that he is laying out now will
>likely end up shaping application development at the software giant
>easily for the next five to ten years.
>
>The strategy that Don laid out last night was, to say the least,
>audacious: push through standards that will rebuild the Internet from
>the ground (or perhaps more accurately, the sockets) on up, replacing
>not just the http layer but potentially the tcp-ip infrastructure. In
>its place would be a more stateful web, utilizing variable length SOAP
>messages that would be more conducive to web service architectures than
>the current unreliable, packet based system.
>
>To get an idea of what that will likely entail (and why it may have such
>a high payoff for Microsoft ... if they don't fail), its necessary to
>understand how sockets currently work. In the early 1980s the Berkeley
>Socket Architecture was built in order to make it possible to stream
>content between two computers using a certain message format called the
>Transport Control Protocol, or TCP, with the packets being limited to
>containing only up to a limited number of bytes. The IP protocol
>overlays the TCP layer and controls the reintegration of messages. Most
>operating systems have integrated the Berkeley Socket architecture and
>have built networks using TCP/IP, to the extent that the older
>Banyan/Novell IPX architecture is becoming an anachronism.
>
>The WS-Routing specification, in an effort spearheaded by Microsoft and
>IBM, would break packets along SOAP boundaries rather than at preset
>lengths, an as such would allow for the efficient transmission of
>complete SOAP commands, though it would rely upon TCP/IP packets and
>even HTTP for the transmission of non-XML attachments such as images,
>sounds or multimedia. To do this effectively, it would mean that every
>single operating system would have to adopt the WS-Routing architecture
>or be shut out of the process; the danger here is that you would end up
>for a while with a two tier Internet where much of the world is not on
>WS-Routing, with the very real consequence that TCP/IP-HTTP solutions
>would need to be built to bridge, actually decreasing the efficiency of
>the networks over the few years that it would take for such a changeover
>to occur. It also assumes a willingness to modify or even replace
>billions of lines of code that have been built to utilize the TCP/IP
>architecture in order to go to this supposed next stage.
>
>Don talked about a number of the other standards that Microsoft is
>currently trying to develop, either through their own auspices or in
>conjunction with IBM, Ariba, and others. These include distributed
>agreement protocols (WS-Coordination and WS-Transaction) for performing
>stateful transactions, federated oriented security (which includes an
>alphabet soup of protocols), and ubiquitous metadata for handling policy
>data. In some cases (such as with security) these efforts are being
>coordinated with OASIS, and in others they are being proposed through
>the WSIA, a standards body that Microsoft co-founded. Significantly
>Microsoft is working only grudgingly with the W3C for the base web
>services specifications of SOAP and WSDL -- ironically the two standards
>that seem to be the most solid and widely adopted. Whether or not that
>is an anomaly or a central datapoint may ultimately determine the fate
>of Microsoft's .NET efforts.
>
>One other facet that Don discussed that I think may point to some
>significant innovation is his discussion about the XML "stack". XML
>actually refers to three different concepts. The first, the one that
>most people are familiar with, is the syntactical expression of "frozen"
>XML, the angle bracket tag and attribute syntax that most people who
>work with XML are familiar with. Above this is the conceptual
>underpinnings of XML, the XML Infoset, which basically is the
>abstraction of a named tree structure with multiple types of nodes. This
>infoset really doesn't care about the syntactical representation of XML
>-- it is instead a document object model as represented internally any
>number of different but congruent ways between systems (i.e., the way
>that Java and .NET represent XML in memory are almost certain to be
>different, but they are equivalent in terms of the abstract model, the
>infoset).
>
>The third form he brought up (the Post Schema Validation Infoset) is an
>infoset representation of XML, but with each item having a specific
>schema association with it. The idea here is an important one, perhaps
>even crucial in the realm of programmatic interfaces, though I think
>there is a danger here in thinking that simply because you have an
>abstract model with intrinsic type associations, that this is equivalent
>to an object that can readily be passed between systems. Don brought up
>a goal that has occasionally been floated of having a compact, binary
>version of XML for intersystem communication, in part because the cost
>of parsing on the one hand and extracting on the other add considerably
>to the total cost of transactions.
>
>However, the same arguments that applied three years ago when this
>argument first arose come out now -- within a homogenous environment,
>passing binary objects is generally not a problem, and passing an
>inforset that has been rendered as a DOM is far more efficient than the
>parse/deparse mechanism that currently existing for passing XML. The
>problem is that the internal binary representation of that infoset IS
>extremely dependent upon the architecture of the host system, and that
>fact will likely not change any time soon.
>
>On the other hand, it is possible that a binary to binary translation
>layer might actually prove to be an easier sell than the older COM/CORBA
>bridge interfaces that (almost) facilitated intersystem communication.
>With the establishment of a consistent DOM through the W3C, being able
>to work with a schema-aware infoset between systems has at least a
>chance to work, providing that there is some effort made to insure that
>the bridges are kept open on both ends.
>
>There was a lot more from the talk that I will try to cover in greater
>detail in subsequent columns. I don't completely agree with every aspect
>of what I'm seeing Microsoft do, I can see valid reasons for most of it.
>Perhaps as a caution, its worth noting that there are standards bodies
>and then there are standards bodies. The fact that much of the
>application level protocols are running through OASIS is ultimately a
>good thing, because with an effort as Herculean as this, the more hands
>you can get to push the boulder up the hill, the more likely you'll
>reach the top.
>
>============================================
>Code: Creating named regexes with XSLT2
>============================================
>
>Here's some more exploration with some of the features in XSLT2 and
>XPath2, specifically the Regular Expressions capabilities. For those of
>you who are not familiar with them, regular expressions (or regexes for
>short) use a set of predefined patterns and special characters to
>attempt to match a whole class of potential strings. They have two
>principle purposes: validating that a given string does in fact fit a
>specific profile and transforming one string into another based upon
>general pattern matching, rather than specific character matches. For
>instance, consider phone numbers. Most American phone numbers follow a
>very distinct sequence: three digits giving the area code (or the toll
>free code, in some cases), three digits indicating the exchange, and
>then four digits containing the local code within that exchange. These
>are critical.
>
>The problem is that there are also a number of different ways of
>grouping these numbers, and when someone enters such a number into a
>form, for instance, it would be nice if you could determine whether the
>phone number is valid in the permutation provided. For instance, for the
>phone number with area code 800, exchange 555 and local number 1212, the
>following are all valid:
>
>800.555.1212
>800-555-1212
>(800)-555-1212
>(800)555-1212
>(800)555.1212
>
>while
>
>800.5554.1212
>
>is not because the exchange has four digits instead of three.
>
>XPath2 provides a number of string manipulation functions that accept
>regular expressions as arguments, but the two that I wanted to
>concentrate on are the matches() function and the replace() function.
>The matches() function takes the string to test and the regular
>expression to test against, and returns a Boolean value of true() if the
>expression matches and false() if it does not. The regular expression
>for validating phone numbers can be pretty ugly, but here is at least
>one stab at it:
>
>^\(?(\d{3})\)?\s?\-?\.?\s?(\d{3})\-?\.?(\d{4})$         (1)
>
>without going into a lot of detail, this basically says:
>
>^           Match from the start of the string
>\(?         Accept an optional opening parenthesis
>(\d{3})     Find a sequence of three digits (\d) and remember them
>\)?         Accept an optional closing parenthesis
>\s?\-?\.?\s?Accept white space, a dash, a period, and maybe more white
>space
>(\d{3})     Remember the next sequence of three digits
>\-?\.?      Accept an optional dash or space
>(\d{4})     Remember the final sequence of four digits
>$           The string must terminate at this point
>
>The matches() function would take a string (such as a phone number) and
>evaluate against the above regular expression, as follows:
>
>matches('(800)555-1212','^\(?(\d{3})\)?\s?\-?\.?\s?(\d{3})\-?\.?(\d{4})$
>')
>
>This would return the Boolean true() because the pattern in regex #1 is
>satisfied.
>
>Similarly, you can use the replace function to perform a substitution of
>a new string for an old string within a third string. The replace
>function uses the Perl notation of back references -- if an expression
>in the regex is contained within parentheses, it is remembered in the
>order that it was encountered. The back references provide a way to
>retrieve these remembered expressions. For instance, in
>
>replace('(800)555-1212','^\(?(\d{3})\)?\s?\-?\.?\s?(\d{3})\-?\.?(\d{4})$
>','$1.$2.$3')
>
>the first expression to be matched (the area code) is assigned to back
>reference $1, the second (the exchange) to back reference $2, and the
>the third (the local code) to back reference $3. This in turn will
>provide the output:
>
>800.555.1212
>
>Now, I don't know about you, but
>'^\(?(\d{3})\)?\s?\-?\.?\s?(\d{3})\-?\.?(\d{4})$' doesn't exactly stand
>up and scream "phone number" to me. This tends to be the case with many
>regexes - they can be puzzled out with a lot of work, but in general
>they are far from being intuitive. Consequently, I got to thinking about
>how I could build a general library of regexes, each of which I could
>then refer to by name. As it turns out there are two very different
>approaches that you can take, each with its own advantages and
>disadvantages.
>
>The first approach places the regexes into an XML file, with each regex
>being referenceable by name. For instance, the following illustrates
>just such a regular expression library (regexLib1.xml):
>
><regularExpressions>
>     <regularExpression id="phone">
>
><pattern>^\(?(\d{3})\)?\s?\-?\.?\s?(\d{3})\-?\.?(\d{4})$</pattern>
>         <replace>($1)$2-$3</replace>
>     </regularExpression>
>     <regularExpression id="zipcode">
>         <pattern>^(\d{5})(-\d{4})?$</pattern>
>         <replace>$1$2</replace>
>     </regularExpression>
></regularExpressions>
>
>This document establishes two regular expressions - one for phones, one
>for zipcodes - along with the standard replacement forms for encoding
>these.
>
>With this approach, I can define a set of two XSLT functions in their
>own namespace (re:) called re:isValid() and re:format(). The
>re:isValid() function takes the string to be validated and tests it
>against the regular expression named in the second argument. For
>instance,
>
>re:isValid('800.555.1212','phone','') => true()
>
>will return the Boolean value true() indicating that it is a valid phone
>number. The third argument is either a local or absolute URL to a
>library of regular expressions, and should usually be set to the empty
>string '' to use the default regexLib.xml file.
>
>Meanwhile, the re:format() function takes a valid (but not necessarily
>conformant) input string and converts it into the standard form given by
>the <replace> element:
>
>re:format('800.555.1212','phone','') => '(800)555-1212'
>
>Here is a preliminary regexes.xsl library file, showing how these
>functions are implemented.
>
><xsl:stylesheet version="2.0"
>     xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>     xmlns:re="http://www.solvex.com/schemas/regex";
>     exclude-result-prefixes="re"
>     >
>     <xsl:output method="xml" media-type="text/xml" indent="yes"/>
>
>     <xsl:variable name="regexes" select="document('regexLib.xml')"/>
>
>     <xsl:function name="re:isValid">
>         <xsl:param name="str"/>
>         <xsl:param name="formatType"/>
>         <xsl:param name="regexLibFile"/>
>         <xsl:variable name="regexLib" select="if ($regexLibFile) then
>document($regexLibFile) else $regexes"/>
>         <xsl:variable name="re"
>select="$regexLib//regularExpression[@id=$formatType]"/>
>         <xsl:variable name="pattern" select="$re/pattern"/>
>         <xsl:variable name="target" select="$re/replace"/>
>         <xsl:result select="matches($str,$pattern)"/>
>     </xsl:function>
>
>     <xsl:function name="re:format">
>         <xsl:param name="str"/>
>         <xsl:param name="formatType"/>
>         <xsl:param name="regexLibFile"/>
>         <xsl:variable name="regexLib" select="if ($regexLibFile) then
>document($regexLibFile) else $regexes"/>
>         <xsl:variable name="re"
>select="$regexLib//regularExpression[@id=$formatType]"/>
>         <xsl:variable name="pattern" select="$re/pattern"/>
>         <xsl:variable name="target" select="$re/replace"/>
>         <xsl:result select="if (matches($str,$pattern)) then
>replace($str,$pattern,$target) else ''"/>
>     </xsl:function>
>
></xsl:stylesheet>
>
>Finally, I wanted to include an xsl file that imported these routines
>and used them in something approaching a real world basis
>(regexesTest.xsl):
>
><xsl:stylesheet version="2.0"
>     xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>     xmlns:re="http://www.solvex.com/schemas/regex";
>     exclude-result-prefixes="re"
>     >
>     <xsl:import href="regexes.xsl"/>
>     <xsl:template match="/">
>
>         <xsl:variable name="phoneNum1" select="'800.555.1212'"/>
>         <xsl:variable name="phoneNum2" select="'800-5554-1212'"/>
>         <xsl:variable name="zipCode" select="'45221'"/>
>         <html>
>             <body>
>                 <h1>re:isValid</h1>
>                 <p>The phone number <xsl:value-of select="$phoneNum1"/>
>is <xsl:value-of select="if (re:isValid($phoneNum1,'phone','')) then
>'valid.' else 'invalid'"/></p>
>                 <p>The phone number <xsl:value-of select="$phoneNum2"/>
>is <xsl:value-of select="if (re:isValid($phoneNum2,'phone','')) then
>'valid.' else 'invalid'"/></p>
>                 <p>The zipcode <xsl:value-of select="$zipCode"/> is
><xsl:value-of select="if (re:isValid($zipCode,'zipcode','')) then
>'valid.' else 'invalid'"/></p>
>                 <h1>re:format</h1>
>                 <p>The properly formatted form of <xsl:value-of
>select="$phoneNum1"/> is <xsl:value-of
>select="re:format($phoneNum1,'phone','')"/>.</p>
>                 <p>The properly formatted form of <xsl:value-of
>select="$zipCode"/> is <xsl:value-of
>select="re:format($zipCode,'zipcode','')"/>.</p>
>                 <p>Here is an example of an alternate regex library
>implementation for <xsl:value-of select="$phoneNum1"/>, returning
><xsl:value-of disable-output-escaping="yes"
>select="re:format($phoneNum1,'phone','regexLibAlt.xml')"/></p>
>                 <h1>re:phone</h1>
>                 <p>You could also use the re:phone() function directly,
>returning <xsl:value-of select="re:phone($phoneNum1)"/></p>
>             </body>
>         </html>
>     </xsl:template>
></xsl:stylesheet>
>
>The line
><p>Here is an example of an alternate regex library ...</p>
>
>uses an alternate library for performing regexes, regexLibAlt.xml. The
>new library itself is significant because it illustrates a way that you
>can actually generate XML code using the re:format() function
>(regexLibAlt.xml):
>
><regularExpressions>
>     <regularExpression id="phone">
>
><pattern>^\(?(\d{3})\)?\s?\-?\.?\s?(\d{3})\-?\.?(\d{4})$</pattern>
>         <replace><![CDATA[
>             <phone>
>                 <areacode>$1</areacode>
>                 <exchange>$2</exchange>
>                 <localcode>$3</localcode>
>             </phone>]]></replace>
>     </regularExpression>
>     <regularExpression id="zipcode">
>         <pattern>^(\d{5})(-\d{4})?$</pattern>
>         <replace>$1$2</replace>
>     </regularExpression>
></regularExpressions>
>
>Here, I've created a CDATA section that contains the mappings into the
>XML code:
>         <replace><![CDATA[
>             <phone>
>                 <areacode>$1<\/areacode>
>                 <exchange>$2<\/exchange>
>                 <localcode>$3<\/localcode>
>             <\/phone>]]></replace>
>
>The $1,$2,$3 work as they did in the previous example. Normally, when
>returned through the <xsl:value-of/> statement, the tagged code is
>"escaped", with "<" and ">" characters converted into the &lt; and &gt;
>sequences. However, if you set the disable-output-escaping attribute of
>the <xsl:value-of/> element to "yes", this escaping is disabled, and you
>generate pure XML code that you can then pass directly into a variable.
>Thus, you could use regexes in this manner to build rich XML on the fly.
>
>The alternative approach would be to create an XSLT named function for
>each regex and define the code inline:
>
><xsl:function name="re:phone">
>     <xsl:param name="str"/>
>     <xsl:variable name="re"
>select="'^\(?(\d{3})\)?\s?\-?\.?\s?(\d{3})\-?\.?(\d{4})$'"/>
>     <xsl:variable name="replaceStr" select="'($1)$2-$3'"/>
>     <xsl:result select="if (matches($str,$re)) then
>replace($str,$re,$replaceStr) else ''"/>
></xsl:function>
>
>This would then be called as
>
>re:phone('888.555.1212') => '(888)555-1212'
>re:phone('888.5554.1212') => ''
>
>Because XPath treats an empty string as being synonymous to the false()
>function, you can use this in an if() statement to handle both valid and
>invalid  input:
>
><xsl:variable name="phoneNum" select="re:phone('888.555.1212')"/>
>The phone number is <xsl:if ($phoneNum) then $phoneNum else 'not
>properly built.'"/>
>
>Just as a side note, if you are not familiar with how to run these
>examples, you need to use the Saxon7.2 parser, available from Source
>Forge at http://saxon.sourceforge.net. Extract the saxon7.jar file into
>a working directory in your classpath, then you can invoke these
>routines from the Windows or Unix command line as
>
>currentDir>java -jar saxon7.jar stub.xml regexesText.xsl
>
>or
>
>currentDir>java -jar -o outputDoc.htm saxon7.jar stub.xml
>regexesText.xsl
>
>if you wanted to direct the output to the file outputDoc.htm.
>
>============================================
>Pass the Word
>============================================
>
>I'm heartened and gratified by the number of people who have joined the
>list (60 and counting in two days). I have directed my current domain
>http://www.kurtcagle.net so that it now points to the Yahoo site, so you
>can see source code samples and archived columns for this work. I have
>had a couple of questions as to why I'm using Yahoo groups to do this.
>At the moment, its a matter of expediancy. My own server is sitting in a
>storage locker in Portland Oregon while looking for a job, and until I
>land somewhere (and I am available, email me at kurt@kurtcagle.net for
>details) it's just easier to use existing tools. Once relocated, I'll
>probably move this newsletter on to its own server, if nothing else than
>to escape the annoying advertising (and replace it with my own annoying
>advertising).
>
>I'm doing this newsgroup as a free service. Please, if you like it, pass
>on the link (http://www.kurtcagle.net) to anyone that you know who might
>want to keep up with what's going on in my own little corner of the XML
>world.
>
>Until next time ...
>
>Kurt Cagle
>
>**********************************************
>Copyright 2002 Cagle Communications
>All Rights Reserved
>**********************************************
>
>
>
>-------------
>Simon St.Laurent - SSL is my TLA
>http://simonstl.com may be my URI
>http://monasticxml.org may be my ascetic URI
>urn:oid:1.3.6.1.4.1.6320 is another possibility altogether
>
>-----------------------------------------------------------------
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>
>To subscribe or unsubscribe from this list use the subscription
>manager: <http://lists.xml.org/ob/adm.pl>    (01)

---------------------------------------------------------------------------
XML Topic Maps: Creating and Using Topic Maps for the Web.
Addison-Wesley, ISBN 0-201-74960-2.    (02)

http://www.nexist.org/wiki/User0Blog    (03)