| Document: FSC-0083
  | Version:  001
  | Date:     17 June 1995
  |
  | Jonathan de Boyne Pollard, FIDONET#2:440/4.0

			A proposed standard for message IDs on FTN systems.

									by

				Jonathan de Boyne Pollard, FIDONET#2:440/4.0

						Version 0.02, Sun 19950507

	This document is (c) Copyright 1995 Jonathan de Boyne Pollard, all
	rights reserved.  Originally written on Tuesday 19950131.

	Permission is hereby granted to copy and use this document without
	modification in any way that you see fit, provided that you do not
	attempt to make money from it, and that you understand that I take no
	responsibility whatsoever for any effect that it may have on your
	machine, data, marital status, or cat.

	Especial permission to freely use and redistribute this document in
	its original form is given to developers of FTN softwares and whatever
	FIDONET Technical Standards bodies may exist from time to time.

	�����������������������
��� 0.0 Definition of terms ���������������������������������������������������
	�����������������������

	This document assumes familiarity with several terms in common use in
	discussion of mail systems, such as `User Agent', `Message Transport
	Agent', and so forth.

	Robot mail programs qualify as UAs, incidentally.

	0.1 Knackered Backward Form
	���������������������������

	This specification uses a modified BNF notation for discussion of
	textual representation of message IDs.

	Literal syntax elements (terminal nodes of the grammar) are enclosed
	in single quotes.

		'MSGID:'	'@'    '<'	  '"'

	Non-terminal nodes are enclosed in angle brackets (greater than and
	less then signs).

		<quoted-text>		<hex-text>		<q-p-site-identifier>

	Production rules comprise a non-terminal, followed by productions.
	Alternate productions for the same non-terminal are separated by a
	vertical bar.

		<qtext-chars> ::=
				  '"' '"'
				| <any-character-except-quotes-NUL-or-CR>

	Optional sequences within a production are indicated in two ways.
	Square brackets enclose a sequence that may occur exactly once or not
	at all.

		[ '@' <dns-name> ':' ]

	Curly braces enclose a sequence that may be repeated any number of
	times.	A leading numeric prefix (usually 0 or 1) indicates the
	minimum number of repetitions.

		1*{ <hex-character> }

	0.1.1 Some standard production rules
	������������������������������������

		<whitespace-char> ::= <tab> | <space>

		<whitespace> ::= 1*{ <whitespace-char> }

		<hex-character> ::=
			'0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'|
			'A'|'B'|'C'|'D'|'E'|'F'|
			'a'|'b'|'c'|'d'|'e'|'f'

		<upper-hex-char> ::=
			'0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'|'A'|'B'|'C'|'D'|'E'|'F'

		<qtext-char> ::=
				'"' '"'
			  | <any-ASCII-character-except-quotes-NUL-or-CR>

		<quoted-text> ::= '"' 0*{ <qtext-char> } '"'

		<quoted-char> ::=
				<any-ASCII-character-except-quotes-backslash-NUL-or-CR>
			  | '\' <any-ASCII-character-execpt-NUL-or-CR>

		<quoted-string> ::= '"' 0*{ <quoted-char> } '"'

		<word> ::= 1*{ <any-ASCII-character-above-SPACE-and-below-DEL> }

	Note the difference between the two forms of quoting.  <quoted-text>
	is a string with embedded quotation marks represented by double
	quotation marks (the way that most BASIC languages do).  However,
	<quoted-string> is a string with all quotation marks and backslashes
	(and, indeed, any other character) escaped by the backslash character,
	in the style of the C and C++ languages.

	�������������������������������������
��� 1.0 Definition and use of message IDs �������������������������������������
	�������������������������������������

	For the purposes of this document, the network is considered to form a
	vast distributed database of messages, which uses replication and
	store and forward distribution to ensure that all carriers of the
	database are kept up to date.  Every message, whether netmail or
	echomail, carries a primary message ID that uniquely identifies it,
	and zero or more reference message IDs that uniquely identify any
	messages that it refers to.

	A primary message ID is a globally unique key that is used for
	uniquely identifying any single given mail message in the database
	(that is, counting all replicas of a message over all of the network
	as "one").  The reference message IDs are used by user agents to form
	a reply graph, allowing the the user to easily navigate the
	messagebase.

	Message transport protocols may require the data in a message ID to be
	encoded so that it may be safely transported.  This standard
	distinguishes between the "underlying" message IDs and the encoded
	forms.  This chapter discusses the underlying message IDs and the
	concepts behind them without reference to a particular encoding, and
	subsequent chapters discuss the various encoded forms.

	1.1 Components of a message ID
	������������������������������

	A message ID comprises two parts, namely a site identifier and a local
	part.  Both of these parts are arbitrary 8-bit binary data, that
	implementations are free to store in any way they choose, but which
	they should never alter.  There are no distinguished characters in
	either the site identifier or local part, especially not terminating
	characters.  So implementations must usually store an additional
	length count for both.

	The "minimum maximum" lengths for the site ID and local part are 64
	octets each, and conforming implementations may not impose shorter
	maximum length restrictions.  In fact, implementations are encouraged
	to impose no length restrictions on message IDs whatsoever (for
	example, it is not unreasonable to expect site IDs to exceed 256
	octets on occasion).

	1.2 Preservation of uniqueness
	������������������������������

	A site that creates messages (by entering them into the distributed
	database) must also issue message IDs, and must ensure that the global
	uniqueness property of message IDs is preserved.

	A site MUST ensure that it issues unique local parts to individual
	messages.  Two or more sites may not have the same site identifier,
	unless they *all* co-operate to ensure that they do not issue
	duplicate local parts.

	The administrative procedures necessary to obtain a unique site
	identifier are beyond the scope of this document.  Usually site
	identifiers will be FTN 5D addresses, or fully qualified DNS names,
	because administrative procedures for assigning such are already in
	place.  However, they are not restricted to be such.

	The means by which a site invents new local parts is beyond the scope
	of this document.  A discussion of some example options for
	implementors to consider is given in an appendix.

	1.3 Reference message IDs
	�������������������������

	Reference message IDs in a message denote messages to which it is
	related, comprising a "local subset" of the overall reply graph (i.e.
	the direct and indirect ancestors of the message), which each message
	carries around with it.

	Carrying around multiple reference message IDs provides overlap,
	allowing for the overall reply graph to be reconstructed even in the
	absence of intermediate messages (if they had expired, or had not yet
	arrived due to propagation lag, for example).

	UAs that conform to this standard MUST ensure that only messages that
	start new threads (i.e. messages entered into the network not in
	response to any existing message) have no reference message IDs.

	All other messages that they create MUST contain at least one
	reference message ID, being that of the message that is being
	responded to.

	[[ Luckily, schemes already in existence mean that in practice
	   non-conforming User Agents will generally preserve this single back
	   link, as well.  ]]

	When responding to a message, user agents must create the reference
	message ID list of the response by taking the list of reference
	message IDs from the original message, and appending the primary
	message ID of the original message to the tail.

	A reference message ID list should not be truncated, unless transport
	or storage limitations are in danger of being exceeded.  In which
	case, message IDs may only be removed from the head of the list.
	Removing from the tail would eliminate links to immediate ancestor
	messages, and removing from the middle would alter the reply graph.

	������������������������������������������������������������������������
��� 2.0 Quoted printable encoding for storing 8-bit data in 7-bit transports ��
	������������������������������������������������������������������������

	To encode the 8-bit data in message IDs for transport by 7-bit
	transport layers, we use a variation on the widely used Quoted
	Printable form [RFC1521] [RFC1522].

	2.1 Grammar of Quoted Printable encoding
	����������������������������������������

	The grammar of the 7-bit encoding of 8-bit data in a quoted printable
	word is as follows.

		<q-p-word> ::=
				<word>
			  | <quoted-text>
			  | [ '=' ] 1*{ <q-p-character> } [ '=' ]

		<q-p-character} ::=
				<any-ASCII-character-bar-ctls-wspace-quote-and-equals>
			  | <q-p-quoted-char>

		<q-p-quoted-char> ::= '=' <upper-hex-char> <upper-hex-char>

	2.2 Conversion from 8-bit to 7-bit
	����������������������������������

	Rule #1 (non-quoted transparent 7-bit): Where the 8-bit data consist
		of nothing but ASCII characters above SPACE and below DEL, they
		may be copied literally to the 7-bit representation.

	Rule #2 (quoted transparent 7-bit):  Where the 8-bit data consist of
		nothing but ASCII characters except CR and NUL, they may be
		converted to the 7-bit representation by enclosing them in quotes,
		and escaping every embedded quotation mark with a second quotation
		mark.

	Rule #3 (8-bit quoted): Where the 8-bit data contain CR or NUL, or any
		non-ASCII characters, they are converted to a 7-bit representation
		in two stages.

		Firstly, all non-ASCII characters, all ASCII control characters,
		SPACE, DEL, '"', and '=', are converted to "quoted" form.  Quoted
		form is an '=' character followed by the hexadecimal value of the
		character represented as two uppercase hexadecimal digits.

		Secondly, the entire string is then enclosed by one leading and
		one trailing '=' character.

	2.3 Conversion from 7-bit to 8-bit
	����������������������������������

	Where the 7-bit field is delimited by equals signs, it is a fair bet
	that it comprises 8-bit data to which Rule 3 has been applied.
	However, it is possible that sites in the 7-bit world may produce data
	with leading and trailing equals signs.

	Reverse of Rule #3 : If, after stripping the leading and trailing '=',
		the remaining text can be converted back using the reverse of Rule
		3, then that 8-bit data is the actual message ID.  Otherwise the
		reverse of Rule 2 should be applied to the original 7-bit data.

	Reverse of Rule #2 : If the 7-bit data are enclosed by quotes the
		reverse of Rule 2 should be applied to remove the enclsing quotes
		and any embedded quotes (8-bit form does not have delimiter
		characters and so does not require quoting).  Otherwise the
		reverse of Rule 1 should be applied.

	Reverse of Rule #1 : The 7-bit data are copied to the 8-bit data.

	2.4 Rationale
	�������������

	The intention is that <q-p-word> tokens will not be parsed as separate
	words by most 7-bit grammars.  The elimination of quotes, whitespace,
	and control characters by Rule 3 is part of achieving this.

	Rules 1 and 2 allow message IDs created by 7-bit standards to enter
	and travel within the 8-bit world, and be restored to their original
	form when they return to the 7-bit world.  Returning 7-bit message IDs
	to their original form means that 7-bit duplicate checking is not
	broken by 8-bit gateways.

	The unfortunate side-effect is that any 8-bit data generated in the
	7-bit world will be returned to the 7-bit world as 7-bit data in Q-P
	encoded form.  However, the original 8-bit data are unlikely to work
	in the 7-bit world in the first place, so this is no great loss.

	Rule 3 is the most general rule of the three.  Rule 3 applies to true
	8-bit message IDs generated in the 8-bit world that use 8-bit
	characters, allowing them to travel across the 7-bit world with a
	reasonable chance of remaining intact.

	The elimination of the equals sign by Rule 3, replacing it with its
	Q-P encoding, ensures that the decoding process can assume that an
	equals sign not followed by two uppercase hex characters is not a
	valid Rule 3 encoding, and so fall back to decoding Rule 2.

	���������������������������������������������������������������������
��� 3.0 Storage of message IDs in type 2.0, 2.0+, and 2.2 message packets �����
	���������������������������������������������������������������������

	Type 2.0 message packets [FTS0001], type 2.0+ message packets
	[FSC0039], and type 2.2 message packets [FSC0045] are used for message
	transport over much of FIDONET.  They do not have space in their
	message headers available for message IDs (along with a lot of other
	things), therefore message IDs must be transferred to the body of the
	message for transport in these forms, and retrieved from the body of
	the message afterwards.

	The existing "kludge line" mechanisms [FSC0068] are used to do this.

	There are two concerns here.

	Firstly, it is preferable that as much of the reply graph as possible
	is preserved, even in the face of tools that use existing MSGID/REPLY
	schemes [FTS0009].

	Secondly, message IDs are 8 bit data, and must be encoded into a 7-bit
	form that will be reliably transported in the bodies of type 2.0,
	2.0+, and 2.2 message packets.

	3.1 Conversion to and from kludge lines
	���������������������������������������

	The primary message ID of a message is stored to and retrieved from a
	"MSGID:" kludge line.

	All of the reference message IDs of a message are stored, in order
	from first to last, in a single "REFER:" kludge line.  The last
	reference message ID of a message (its immediate ancestor, in other
	words) is stored in a "REPLY:" kludge line.  Note that the information
	in the "REFER:" kludge line is a superset of the information in the
	"REPLY:" kludge line.

	If a message has zero reference message IDs (it is the start of a new
	thread), then the "REFER:" and "REPLY:" kludge lines are omitted.

	If, upon decoding from type 2.0, 2.0+, or 2.2 message transport
	format, a "REFER:" kludge line exists, then its contents are assumed
	to be the complete list of reference message IDs (in encoded form) for
	the message, and the "REPLY:" kludge line is ignored.  Otherwise, the
	content of the "REPLY:" kludge line (if any) is used for the single
	reference message ID of the message.

	3.2 Compatibility with existing MSGID/REPLY schemes
	���������������������������������������������������

	There are two compatibility considerations.  It is important that
	encoded message IDs be correctly parsed by implementations using older
	less versatile standards.  It is also important that implementations
	expecting older MSGID/REPLY pairs will destroy as little linking
	information as possible.

	3.2.1 Grammar considerations
	����������������������������

	There are two valid interpretations of FTS-0009, both of which
	(should) use the following grammar :

		<msgid> ::= <soh> 'MSGID: ' <address-text> <whitespace> <hex-text>
		<reply> ::=	<soh> 'REPLY: ' <address-text> <whitespace> <hex-text>

		<soh> ::= ASCII SOH character
		<address-text> ::= <quoted-text> | <word>
		<hex-text> ::= 1*{ <hex-character> }

	The "VFIDO" interpretation assumes that MSGID/REPLY kludges are the
	textual representation of an (address, number) ordered pair.  Systems
	using this interpretation may change the case of <hex-text> or may
	renormalise <quoted-text> if they find it to be a FTN 5D address.

	Message IDs from this standard that are stored in MSGID/REPLY kludges
	will be mangled by software applying the VFIDO interpretation of
	FTS-0009.  Such software is not compatible with this standard.

	The "Mark Kimes" interpretation assumes that MSGID/REPLY kludges are
	text separated by whitespace, and preserves the contents of
	<quoted-text> and <hex-text> without change.

	The encoding scheme outlined in section 2.2 produces two whitespace
	separated text fields.	So software applying the "Mark Kimes"
	interpretation of FTS-0009 will not mangle the encoded message IDs.

	In many cases, softwares using the "Mark Kimes" interpretation will in
	fact parse <hex-text> as

		<hex-text> ::= <word>

	As long as software applying the "Mark Kimes" interpretation of
	FTS-0009 is not written to truncate either field, or complain about a
	non-numeric <hex-text> portion, it is compatible with this standard.

	3.2.2 Reply linking
	�������������������

	FTS-0009 implementations will generate MSGID kludges, transfer the
	content (Mark Kimes interpretation) of the MSGID kludge data of an
	original message into the REPLY data of a response message, and will
	not generate a REFER kludge.

	So reply linking will be preserved, but reference information beyond
	the immediate ancestor of a message will be lost.

	3.3 Quoted printable encoding
	�����������������������������

	The 8-bit data in message IDs is encoded into 7-bit MSGID/REPLY data
	for transport in type 2.0, 2.0+, and 2.2 message packets by using the
	quoted printable encoding outlined in chapter 2, along with the
	following grammar.

		<msgid> ::= <soh> 'MSGID: ' <7-bit-encoding>
		<reply> ::= <soh> 'REPLY: ' <7-bit-encoding>
		<refer> ::= <soh> 'REFER: '
						<7-bit-encoding> 0*{ <whitespace> <7-bit-encoding> }

		<7-bit-encoding> ::= <q-p-site-ID> <whitespace> <q-p-local-part>

		<q-p-site-ID> ::= <q-p-word>
		<q-p-local-part> ::= <q-p-word>

	Applying Rule 1 of Q-P encoding to local parts is safe as long as
	<hex-text> (from the FTS-0009 grammar) is in actuality treated as
	<word> by most implementations, as outlined in the compatibility
	notes.

	Rule 2 should not be applied to local parts, because the grammar of
	FTS-0009 does not allow for quoted text in the <hex-text> portion.

	The restrictions in Rule 3 have deliberate effect here.  FTS-0009
	sites will rarely produce data with leading and trailing equals signs,
	so reversing Rule 3 will be unlikely to be subject to spurious data.
	In theory, relaxing Rule 3 reversal to include decoding lowercase
	hexadecimal as well as uppercase hexadecimal would mean that sites
	that convert the case of MSGID/REPLY (as part of the "VFIDO"
	interpretation) would not break Q-P encoding.

	However, the "VFIDO" interpretation will usually do far more damage
	than simple case conversion, which will be impossible to restore.
	Rather than attempt the reverse conversion (which could have the
	undesirable effect of causing different messages to end up with the
	same 8-bit message ID if the local part were truncated to eight
	characters in the 7-bit world), any "VFIDO" mangling that occurs will
	prevent Q-P decoding from succeeding.

	This means that 8-bit message IDs that look like incomplete or damaged
	Q-P encodings are not gateway problems, but are more likely to be the
	result of a site using the "VFIDO" interpretation in the 7-bit world.

	������������������������������������������������������
��� 4.0 Storage of message IDs in type 2.3 message packets ��������������������
	������������������������������������������������������

	The storage format of type 2.3 messages (so-called "extensible type 2"
	[TYPE2EXT]) provides space in the message headers for both a primary
	message ID and an arbitrary list of reference message IDs.

	All message IDs are stored as 8-bit binary strings, using length
	counts rather than delimiters.	Therefore message IDs can be stored
	directly in type 2.3 messages.

	������������������������������������������������������
��� 5.0 Storage of message IDs in type 3.x message packets ��������������������
	������������������������������������������������������

	There is such a wide variety of type 3 message formats that this
	standard doesn't hope to cover them all.

	For those with binary "chunks", chunk types 'PMID' (primary message
	ID) and 'RFER' (reference message IDs) are expected to have the
	following form :

	 �����������������������������������������������������Ŀ
	 � Length of site identifier					WORD32 �
	 �����������������������������������������������������Ĵ
	 � Site identifider 							 ...   �
	 �����������������������������������������������������Ĵ
	 � Length of local part 						WORD32 �
	 �����������������������������������������������������Ĵ
	 � Local part									 ...   �
	 �������������������������������������������������������

	Those schemes that use text format headers and require field
	delimiters may care to use the Q-P encoding outlined in chapter 2.

	���������������������������������������������������������
��� 6.0 Storage of message IDs in RFC822 and RFC1036 messages �����������������
	���������������������������������������������������������

	The grammar of "Internet" messages is defined by the standards for
	ARPA text messages [RFC0822] and for Usenet news messages [RFC1036].

	6.1 Restrictions on interconversion
	�����������������������������������

	Interconversion between a FIDO message ID and an RFC822 Message-ID is
	restricted by several factors.  The major factor is that RFC0822
	actually places greater restrictions upon Message-IDs than this
	standard does upon FIDO message IDs (in part because this standard is
	designed to also be able to handle X.400 message identifiers and
	others transparently as well).  It mandates that the <address> portion
	of a Message-ID be a valid DNS name.

	A secondary factor is reversibility, in that many gateways exist
	between FTN and RFC822, and so message IDs that cross the boundary
	more than once will retain as much of their original ID information as
	possible.  There is more information contained within a FIDO message
	ID than in an RFC822 Message-ID.  In particular, the <address>
	portions of RFC822 Message-IDs are not case sensitive, whereas the
	site ID of a FIDO message ID is treated as 8-bit data for the purposes
	of comparison.

	These are handled by restricting the allowable conversions that a
	conformant gateway may use on a message ID, by ensuring that all of
	the FIDO information is not lost when converted to the (narrower
	bandwidth) RFC822 Message-ID format, and by allowing gateway softwares
	to infer a meaning from the site identifier portion of a message ID.

	This is the *only* part of this standard where it is allowed for
	softwares to place a meaning on the site identifier of a message ID.

	6.1 Converting to RFC822 form
	�����������������������������

	6.1.1 Site identifier recognition
	���������������������������������

	Gateway softwares are allowed to examine a site identifier of a
	message ID and determine whether it is in a format that they recognise
	or not.  This standard specifies what gateway softwares should do when
	they encounter a site identifier that is a recognisable DNS name or
	one that is recognisable FIDO 5D address, and what form the DNS name
	for RFC822 must take.

	Site identifiers that are not FIDO 5D addresses are really beyond the
	scope of FIDONET documentation.  If an implementation recognises
	another form of site identifier (such as X.400 O/R addresses) then it
	is free to translate that site identifier to and from DNS form, as
	long as it knows how (there are RFCs on how to perform X.400
	conversion).

	This message ID standard imposes no restrictions on site identifiers,
	allowing any scheme to be administered on FIDONET.  It is therefore up
	to the site identification schemes themselves to provide their own
	mappings to and from DNS names.

	Gateways are free to drop messages with message IDs that they do not
	understand how to convert.  Both the FIDONET and RFC worlds depend
	heavily upon message IDs for detecting messages duplicates, and so it
	is better that a gateway should NOT distribute messages with message
	ID formats that it doesn't understand how to convert to RFC822 form,
	rather than that it does so incorrectly.

	6.1.1.1 Site identifiers that are DNS names
	�������������������������������������������

	If the site identifier of a message ID can be parsed as a legal DNS
	name according to the grammar of RFC822 then, even if it cannot be
	resolved to an IP address or MX record, it must be used as the domain
	name of the RFC message ID, and the local part must be passed through
	unchanged.

	This allows for RFC message IDs to enter and leave 8-bit FIDONET
	without change, even via gateways that have no knowledge of or
	connectivity to the originating RFC host.

	6.1.1.2 Site identifiers that are FIDO 5D addresses
	���������������������������������������������������

	The conversion process for message IDs where the site identifier can
	be parsed as a FIDO 5D address in the forms DOMAIN#Z:N/N.P or
	Z:N/N.P@DOMAIN depends from the "domain" (in the FIDO sense of the
	word) of the address.

	6.1.1.2.1 Site identifiers that are 5D addresses in FIDONET
	�����������������������������������������������������������

	If the site identifier of a message ID is parseable as a FIDO 5D
	address of the form Z:N/N.P@FIDONET or FIDONET#Z:N/N.P (i.e. in the
	FIDONET domain itself), then the DNS name used for the RFC message ID
	must be the DNS equivalent of that address.

	This is because MX records exist in the DNS for all of the zone:net
	pairs for 5D addresses in the FIDONET "domain", in the form

		p#.f#.n#.z#.fidonet.org

	where # is a number without leading zeroes giving the appropriate
	portion of the 5D address.  Therefore this is the conversion that must
	be used.

	6.1.1.2.2 Site identifiers that are 5D addresses outside of FIDONET
	�������������������������������������������������������������������

	Most other "domains" (in the FIDO sense of the word), are free to
	choose their own DNS domain name, but have not yet done so.

	Therefore, constructs such as p3.f0.n444.z81.os2net.ftn (which several
	people have INCORRECTLY inferred from other FTS documentation) are NOT
	ALLOWED as the DNS name in an RFC Message-ID.  .ftn is NOT a valid
	top-level DNS domain, for a start, and there is no guarantee that
	OS2NET would adopt that DNS name, either.

	(p#.f#.n#.z#.os2net.fidonet.org anyone ?)

	6.1.1.2.3 Conversion of local parts
	�����������������������������������

	Where a gateway has recognised a site identifier to represent a FIDO
	5D address that it knows the DNS name for, the local part must then be
	encoded.

	According to the grammar in RFC822, any ASCII character (from NUL to
	DEL) is legal in the local part of an RFC822 Message-ID, because
	<quoted-pair> (q.v.) allows any special characters to be escaped.

	Since RFC822 transport is merely 7-bit just like type 2.0, 2.0, and
	2.2 message packets are, we use the quoted-printable scheme given in
	chapter 2.

	However,

	6.1.1.3 Site identifiers that are not recognisable 5D addresses
	���������������������������������������������������������������

	No implementation may extend the FIDO 5D address to DNS name
	conversions for site IDs that are given above.  If the message ID is
	"almost, but not quite" a FIDO 5D address, then the message should for
	preference be discarded at the gateway rather than being passed
	through.

	Message IDs with abitrary site identifiers are perfectly acceptable to
	this standard, since it ascribes no meaning to site identifiers within
	FIDONET.  However, RFC822 and the existing RFC domain name system
	can only handle a restricted subset of the whole range of FIDO 5D
	addresses.

	6.1.1.4 Other site identifiers
	������������������������������

	As mentioned before, gateways are allowed to support other site
	identification schemes that are not FIDO 5D addresses, and convert
	site identifiers in those forms to DNS names as they please.

	It should be borne in mind when designing such conversion schemes that
	the domain part of an RFC 822 message ID can only contain ASCII
	characters that are not control characters, whitespace, or special
	delimiter characters, because of the definition of <atom> in that
	standard (q.v.).  The quoted printable encoding outlined in chapter 2
	of this document is probably not sufficient for handling full 8-bit
	site identifier schemes, in which case the scheme in RFC1522 should be
	investigated.

	6.1.2 Preserving information
	����������������������������

	Although this standard recognises two forms for a FIDO 5D address,
	there is only one valid form for that address in the DNS.  For reverse
	conversions to succeed, when an RFC message re-enters 8-bit FIDONET
	(possibly via another gateway), the *exact form* of the original site
	identifier must be reconstructed, otherwise FIDO softwares will treat
	the two message IDs as different.

	Although other schemes exist, which encode the 5D address in the local
	part, and use a "generic" domain name of "fidonet.org" (which is not a
	valid host name), it is preferred that the semantics of a message ID
	("WXYZ local part generated at ABCDE site") be preserved, especially
	as FIDONET sites are visible to the RFC world via the DNS anyway.

	It is therefore suggested that the original FIDONET site identifier
	(since it will be 7-bit text) be encoded as a <comment> token
	immediately following the relevant message ID, using quoting to escape
	any embedded punctuation (q.v. the grammar in RFC 822).

	6.2 Converting from RFC822 form
	�������������������������������

	When converting from RFC822 form back to 8-bit FIDONET message IDs,
	gateways should determine whether the address portion of the
	Message-ID is a hostname under the fidonet.org domain.

	If it is, a comment token should be scanned for to find the original
	form of the 5D address, and the site identifier should be
	reconstructed from it if found, or from the given DNS name in the form
	DOMAIN#Z:N/N.P if no comment token were present.  The inverse of the
	quoted printable encoding outlined in chapter 2 should then be applied
	to the local part.

	Otherwise, the 7-bit RFC822 Message-ID should be stored in the 8-bit
	FIDONET message ID without change.

	6.3 Reply linking
	�����������������

	According to RFC1036, message IDs occur in the Message-ID:  and in the
	References:  header for news (echomail).  Although RFC822 specifies an
	In-Reply-To:  header for mail (netmail), it makes it difficult to use,
	because it need not contain a message ID.

	The model for message identification used by RFC1036 closely matches
	the model outlined in this standard (it is probable that there is only
	one way to skin this particular cat).  There is thus a direct mapping
	between the primary message ID defined by this standard and the
	RFC1036 Message-ID:  header, and also between the reference message
	IDs defined by this standard and the RFC1036 References:  header.

	This means that in normal use the reference message ID list will be
	properly maintained by Usenet softwares.

	�����������������������������������������������
��� A.0 Discussion on generating unique local parts ���������������������������
	�����������������������������������������������

	How any given site generates unique local parts is up to it.  So this
	appendix should only be taken as a guideline.

	On sites where there is only one piece of software assigning message
	IDs (e.g.  there is only one UA, or the MTA itself assigns message
	IDs), then a simple "take a ticket" scheme could work.	Multiple
	instances of that piece of software running simultaneously would need
	to arbitrate access to that "ticket dispenser" amongst themselves.

	A discussion of `sequencers' (which is the proper name for this idea)
	and how atomic operations on them can be implemented, can be found in
	any good computer science textbook on concurrent systems.

	Unfortunately, in today's heterogeneous world, it is difficult to the
	point of impossibility to get every piece of software to agree to use
	one single central sequencer.

	It is obvious that using just the date/time for a message ID is
	insufficient on multitasking systems, or even on single tasking
	systems that can generate multiple messages per clock tick.

	What is less obvious is that it is not a good idea to use the name of
	the software generating the message ID and a sequencer maintained by
	that software as the unique local part.  The problem here is that it
	is not guaranteed that different softwares will use different names
	(especially if they are called "Message Editor" (-:), so it is
	possible that different softwares could generate duplicate local
	parts.

	Some form of "product ID code" would of course rectify this, but given
	the amount of software in use and under development these days, a
	centrally administered product ID database hasn't been a viable option
	for decades now.

	There are, of course, simpler schemes, that can guarantee to produce
	unique local parts, because they rely on features that are guaranteed
	unique to every individual application running, and do not rely on
	different applications co-operating to use the same central
	facilities, such as a site-wide sequencer.

	One commonly used scheme is to use a combination of the current date
	and time and the process and thread IDs of the software creating the
	message ID.

	e.g.  1995Jan31.123426.26.1
	  or  1995013112343600260001

	This doesn't have to be human-readable calendar time, of course.  It
	could equally well be the POSIX 1003.1 time (seconds since The Epoch),
	or the Julian date plus the time of day.

	If the time isn't granular enough, a sequence number (which can be
	maintained individually by each process) can be added to increase its
	granularity.

	On just about every operating system in the world, including
	multi-user ones, the <time,process,thread,seq> 4-tuple will be unique
	on one machine *forever* (or until the clock wraps around, at least).

	e.g.  1995Jan31.123426.26.1.2
	  or  19950131123436002600010003

	On multiple machine sites, where all machines share the one site
	identifier, the above scheme can be extended to include the "hidden"
	local machine name, which will be assumed to be made available (in
	some fashion) to the softwares generating the message IDs.

	This yields a unique <machine,time,process,thread,seq> 5-tuple.

	e.g.  utopium.1995Jan31.123848.26.1.4
	  or  utopium.19950131123907002600010005

	Again, the "intra-site" machine name can be anything, from the local
	uname() (for UNIX people) to the NETBIOS machine name (for PC based
	LAN systems).

   �������������������������
��� Bibliography and Author ���������������������������������������������������
   �������������������������

	[FTS0001] A Basic FIDONET Technical Standard, version 15.  Randy Bush,
			  Pacific Systems Group.  FIDONET#1:105/6.0.  30th August
			  1990.

			  ( Defines the type 2.0 packet message transport format.  )

	[FTS0009] A standard for message identifiers and reply chain linkage,
			  version 1. Jim Nutt.	FIDONET#1:114/30.0.  17th December
			  1991.

			  ( Defines the MSGID/REPLY kludges.  )

	[FSC0034] Gateways to and from FIDONET.  Technical, administrative,
			  and policy considerations.  Randy Bush, Pacific Systems
			  Group.  FIDONET 1:105/6.0.  30th August 1990.

			  ( Discussion on features that should be preserved across
				gateways, and on good gateway behaviour in general.  )

	[FSC0039] A type 2 packet extension proposal, version 4. Mark A.
			  Howard.  FIDONET#1:260/340.  29th September 1990.

			  ( Defines the type 2.0+ packet message transport format.	)

	[FSC0045] A proposal for a new packet format, version 1. Thom
			  Henderson.  FIDONET#1:107/542.1.	17th April 1990.

			  ( Defines the type 2.2 packet message transport format.  )

	[FSC0068] A proposed replacement for FTS-0004, version 1. Mark Kimes.
			  FIDONET#1:380/16.0.  13th December 1992.

			  ( Defines kludge lines.  )

	[RFC0822] Standard for the format of ARPA Internet text messages.
			  David Crocker, University of Delaware.  13th August 1982.

			  ( Defines the grammar and semantics of RFC messages.	)

	[RFC1036] Standard for the interchange of USENET messages.	M Horton,
			  AT&T bell labs; and R. Adams, Centre for seismic studies.
			  December 1987.

			  ( Defines changes to the grammar and semantics of RFC822
				that are required for news instead of mail, including
				reply linking.	)

	[RFC1521] MIME (Multipurpose Internet Mail Extensions) Part One:
			  Mechanisms for specifying and describing the format of
			  Internet message bodies.	N. Borenstien, Bellcore; and N.
			  Freed, Innosoft.	September 1993.

			  ( Defines Quoted Printable encoding of text.	)

	[RFC1522] MIME (Multipurpose Internet Mail Extensions) Part One:
			  Message header extensions for non ASCII text.  K. Moore,
			  University of Tennesee.  September 1993.

			  ( Defines how to use Q-P encoding in message headers.  )

	[TYPE2EXT] An extension to type 2.0, 2.0+, and 2.2 message transport
			   formats to eliminate most kludge lines from the message
			   body.  Jonathan de Boyne Pollard.  FIDONET#2:440/4.0.
			   [ Not yet released.	]

	�����������
	Jonathan de Boyne Pollard
	FIDONET#2:440/4.0