Under the Hood

Before writing your own filters it is important that the flow through the plugin is understood. NoWeb source files are input to a pipeline of predefined filters to translate the source code into an XML representation of the code. The flow through the plugin (for a weave operation) is

Pipeline

Producer

The Producer takes the NoWeb source and injects it in the pipeline. It expands any include instructions (\includenowebfile{...}) and marks each line with a line number and file name. Using the simple example convertInteger.nw, typical output is (dots used to truncate line for purposes of clarity).

[[[convertInteger.nw]]][[[35]]]<<radix value found so process remainder of string>>= [[[convertInteger.nw]]][[[36]]]<<extract the prefix radix>> [[[convertInteger.nw]]][[[37]]]<<set up return value>> [[[convertInteger.nw]]][[[38]]]<<Set up loop variables>> [[[convertInteger.nw]]][[[39]]]for ( int i = (int)radixLength+1; i < [inString lengt.... [[[convertInteger.nw]]][[[40]]] <<get next character into a string value>> [[[convertInteger.nw]]][[[41]]] <<get character range from list of possible digits>> [[[convertInteger.nw]]][[[42]]] <<add the digit to the return value>> [[[convertInteger.nw]]][[[43]]]} [[[convertInteger.nw]]][[[44]]]@ [[[convertInteger.nw]]][[[45]]] [[[convertInteger.nw]]][[[46]]][[nextChar]] is a convenience variable to make things.... [[[convertInteger.nw]]][[[47]]] [[[convertInteger.nw]]][[[48]]]<<get next character into a string value>>= [[[convertInteger.nw]]][[[49]]]s = [NSString stringWithFormat:@"%c", [inString chara.... [[[convertInteger.nw]]][[[50]]]@ [[[convertInteger.nw]]][[[51]]] [[[convertInteger.nw]]][[[52]]]To pull the correct value from the list of characters. [[[convertInteger.nw]]][[[53]]] [[[convertInteger.nw]]][[[54]]]<<Set up loop variables>>= [[[convertInteger.nw]]][[[55]]]NSRange charRange; [[[convertInteger.nw]]][[[56]]]@ %def charRange

TagSource

TagSource inputs the source with its associated file and line data and converts it to an initial XML form. As can be seen there is some extra information added. The documentation and code parts of the source are detected and put into chunks. The file and line numbers are preserved, while the code chunks have an extra set of attributes. id which is a unique integer identifier for the code chunk (used later in definitions etc); hash which is an identifier made from the filename, chunk name and sequence number (for multiply defined chunks); and name which is the chunk name defined between the <<...>> delimiters. Note that all white space is preserved and that the character data format is used to pass through the lines verbatim. The above lines would be transformed as

<aw:chunk type="code" file="convertInteger.nw" line="35" id="5" hash="NW48ed0e68-54c9de93-1" name="radix value found so process remainder of string"> <aw:code file="convertInteger.nw" line="36"><![CDATA[]]> <aw:include file="convertInteger.nw" line="36" ref="extract the prefix radix" hash="NW48ed0e68-54c9de93-1"/><![CDATA[]]></aw:code> <aw:code file="convertInteger.nw" line="37"><![CDATA[]]> <aw:include file="convertInteger.nw" line="37" ref="set up return value" hash="NW48ed0e68-54c9de93-1"/><![CDATA[]]></aw:code> <aw:code file="convertInteger.nw" line="38"><![CDATA[]]> <aw:include file="convertInteger.nw" line="38" ref="Set up loop variables" hash="NW48ed0e68-54c9de93-1"/><![CDATA[]]></aw:code> <aw:code file="convertInteger.nw" line="39"> <![CDATA[for ( int i = (int)radixLength+1; i < [inString length]; i++ ) {]]> </aw:code> <aw:code file="convertInteger.nw" line="40"><![CDATA[ ]]> <aw:include file="convertInteger.nw" line="40" ref="get next character into a string value" hash="NW48ed0e68-54c9de93-1"/><![CDATA[]]></aw:code> <aw:code file="convertInteger.nw" line="41"><![CDATA[ ]]> <aw:include file="convertInteger.nw" line="41" ref="get character range from list of possible digits" hash="NW48ed0e68-54c9de93-1"/><![CDATA[]]></aw:code> <aw:code file="convertInteger.nw" line="42"><![CDATA[ ]]> <aw:include file="convertInteger.nw" line="42" ref="add the digit to the return value" hash="NW48ed0e68-54c9de93-1"/><![CDATA[]]></aw:code> <aw:code file="convertInteger.nw" line="43"><![CDATA[}]]></aw:code> </aw:chunk> <aw:chunk type="documentation" file="convertInteger.nw" line="45"> <aw:text line="45"><![CDATA[]]></aw:text> <aw:text line="46"><![CDATA[[[nextChar]] is a convenience variable to .... <aw:text line="47"><![CDATA[]]></aw:text> </aw:chunk> <aw:chunk type="code" file="convertInteger.nw" line="48" id="6" hash="NW48ed0e68-61ec64e7-1" name="get next character into a string value"> <aw:code file="convertInteger.nw" line="49"> <![CDATA[s = [NSString stringWithFormat:@"%c", [inString characterAtI .... </aw:chunk> <aw:chunk type="documentation" file="convertInteger.nw" line="51"> <aw:text line="51"><![CDATA[]]></aw:text> <aw:text line="52"><![CDATA[To pull the correct value from the list of .... <aw:text line="53"><![CDATA[]]></aw:text> </aw:chunk> <aw:chunk type="code" file="convertInteger.nw" line="54" id="7" hash="NW48ed0e68-b77bf6c2-1" name="Set up loop variables"> <aw:code file="convertInteger.nw" line="55"> <![CDATA[NSRange charRange;]]></aw:code> </aw:chunk> <aw:definitions chunk="NW48ed0e68-b77bf6c2-1"> <aw:definition>charRange</aw:definition> </aw:definitions>

Note that the XML has been re-indented to make it clearer. In particular the code tags are all on one line in the processed output.

TokenizeCode

TokenizeCode takes this XML output and analyses the code chunks. Code lines are split into tagged structures using token, separator and include tags. For example the source code line 39

<aw:code file="convertInteger.nw" line="39"> <![CDATA[for ( int i = (int)radixLength+1; i < [inString length]; i++ ) {]]> </aw:code>

is translated to

<aw:code file="convertInteger.nw" line="39"> <aw:token><![CDATA[for]]></aw:token> <aw:separator><![CDATA[ ( ]]></aw:separator> <aw:token><![CDATA[int]]></aw:token> <aw:separator><![CDATA[ ]]></aw:separator> <aw:token><![CDATA[i]]></aw:token> <aw:separator><![CDATA[ = (]]></aw:separator> <aw:token><![CDATA[int]]></aw:token> <aw:separator><![CDATA[)]]></aw:separator> <aw:token><![CDATA[radixLength]]></aw:token> <aw:separator><![CDATA[+]]></aw:separator> <aw:token><![CDATA[1]]></aw:token> <aw:separator><![CDATA[; ]]></aw:separator> <aw:token><![CDATA[i]]></aw:token> <aw:separator><![CDATA[ < []]></aw:separator> <aw:token><![CDATA[inString]]></aw:token> <aw:separator><![CDATA[ ]]></aw:separator> <aw:token><![CDATA[length]]></aw:token> <aw:separator><![CDATA[]; ]]></aw:separator> <aw:token><![CDATA[i]]></aw:token> <aw:separator><![CDATA[++ ) {]]></aw:separator> </aw:code>

As can be seen liberties are taken with the definition of separators and tokens (variables). However, for the purposes of the variable referencing it is quite adequate.

XRefFilter

XRefFilter builds a cross reference list of where the chunks are used. Documentation chunks are passed though unchanged, and code chunks similarly except that the cross references are added for variables (tokens) and chunk usage. For example the following is added after the chunk name "Set up loop variables" showing where this chunk is used (chunk id 5) and where the defined variable charRange is used (chunks id 8 and 9):

<aw:definitions chunk="NW48ed0e68-b77bf6c2-1"> <aw:usedIn> <aw:where ref="5"/> </aw:usedIn> </aw:definitions> <aw:definitions chunk="NW48ed0e68-b77bf6c2-1"> <aw:define var="charRange"> <aw:where ref="8"/> <aw:where ref="9"/> </aw:define> </aw:definitions>

Note that where the others are purely Java manipulations this filter uses an XSL transformation to do the work. The XRefFilter class is just a support mechanism for the XSL.

RunFilters

RunFilters runs a selected set of XSL transforms. If the user does not specify a filters definition file then the default one (filters.xml) is used:

<?xml version="1.0" encoding="UTF-8"?> <antweave> <pipeline name="tangle"> <filter name="tangle.xsl"/> </pipeline> <pipeline name="weave"> <filter name="index.xsl"/> <filter name="weave.xsl"/> </pipeline> </antweave>

This defines two pipelines: "tangle" and "weave" which are called via the action attribute in the Ant task ("tangle" is the default if action is not specified). If the weave action is invoked the RunFilters class runs the index.xsl transform, which marks off variables within the code sections based on the definitions section. For example the line 39 output above would be transformed to

<aw:code file="convertInteger.nw" line="39"> <aw:token>for</aw:token> <aw:separator> ( </aw:separator> <aw:token>int</aw:token> <aw:separator> </aw:separator> <aw:token>i</aw:token> <aw:separator> = (</aw:separator> <aw:token>int</aw:token> <aw:separator>)</aw:separator> <aw:variable>radixLength</aw:variable> <aw:separator>+</aw:separator> <aw:token>1</aw:token> <aw:separator>; </aw:separator> <aw:token>i</aw:token> <aw:separator> < [</aw:separator> <aw:token>inString</aw:token> <aw:separator> </aw:separator> <aw:token>length</aw:token> <aw:separator>]; </aw:separator> <aw:token>i</aw:token> <aw:separator>++ ) {</aw:separator> </aw:code>

showing that radixLength is the only defined variable used in this line. The RunFilters class then runs the weave.xsl transform and passes the result on to the final filter, OutputWriter.

OutputWriter

OutputWriter outputs the data to the appropriate destination file.

Copyright 2015 Hugh Field-Richards. All Rights Reserved.