File README, Last updated 5/2/97

This is a snapshot directory, containing the up to date
version of the MARC converter, with related table files.

FILES IN THE DIRECTORY (WHICH USE IS NOT OBVIOUS):

recordTest.in    contain a set of input records, delimited by
(recordTest.<#>) 35(oct) (end of record), and the respective
		 converted records.  Note that input to the
recordTest.out 	 program may also be given using the functions
		 insertF(), insertSF() either in the code
		 before compilation, or through the debugger
		 when stopped at a special place

lineFormat.out  contain a "line format" version that displays
lineFormat.in	the input and output records subsequently,
		field by field, subfield by subfield.  The
		lineFormat.* files are generated only if the
		converter is compiled using the -DMARC_DEBUG
		switch.  In addition, The -DMARC_NO_SORT
		switch may be used to suppress sorting of the
		output records. In this way corresponding
		FROM and TO fields are shown the same order,
		making comparison easier.

*.chk           copies of the lineFormat.* files with the
		current setup of the program.  For test by
		diff'ing.

fromToFormats.txt
	       For test mode, i.e. convertReocordStream() is
	       run locally. The set that you wish to test. Contains
	       a single pair of formats.


convsets.txt   The conversion sets to be read. Should conform
	       to the .vdt and .ctl files that are available.
	       e.g. An entry "{UNIMARC} {USMARC}" means that
	       (at least) the files uniusm.vdt and uniusm.ctl
               are available in the path.
format.test*.int Files including large numbers of records for testing


The following files are table files, which may or may not be 
up to date. When downloading new tables, those files are overwritten. 

fffttt.ctl      Conversion table files for control fields and
fffttt.vdt 	variable data fields respectively, where fff
		are the first three letters of the
		from-format name, and ttt - first three
		letters of the to-format name.

/*SPESIALTILFELLER*/


fffttt.dcs     Decision table files for content dependent
	       conversion.

fffttt.cnd     Conditional conversion tables, taking care of
               conditions that need to be met for a
	       conversion to take place

fffttt.ind     Indicator tables - direction for setting of
               indicator in nontrivial cases.

usmarc.isb     A table used to apply/remove ISBD punctua-
               tion to/from USMARC records.


Other files with strange names:
Output record files which I sometimes used, and therefore did not
yet have the mental strength to delete. dont loose hope though.

COMMENTS:
1. Compilation options (Makefile)
-DMARC_MULTI_TABLES This option is temporary, and should always be
	            on! Enables loading of more than one conversion set
	            into memory.

-DMARC_DEBUG        The command line option in the makefile
	       	    enables the line format files, and some of
	       	    the stdout error messages.
	       	
-DMARC_NO_SORT 	    Suppresses the sorting of the output
	       	    records. It is meant to allow input fields
	       	    and output fields to be viewed side by side
	       	    (lineFormat.in/out files). Since the
	       	    resulting record is not sorted, and hence
	       	    wrong, this facility is only enabled when
	       	    the MARC_DEBUG option is active.

-DMARC_REDIRECT     Allows the program to be run using
		    input_outout redirection instead of
		    using record file names.



-DMARC_NO_CONVERT   Highly inofficial! allows viewing of long
		    record streams. They are written into the
		    file "lineFormat.in" Only active in
		    the debug mode.

-DMARC_NO_ISBD      For switching off the ISBD
		    punctuation. Please look at the comment
		    about ISBD punctuation below.

-DMARC_999_NON_STANDARD_FIELDS
		    For enabling the creation of
		    999 fields for input fields that
		    are not a part of the national
		    MARC format, and are therefore
                    not in the table. Normaly, these
		    fields will be ignored by the program.

-DAPPEND_TO_SUBFIELD
		   Avoid repeating subfields in an output record field.
                   Append the data of the candidate repeated subfield
                   to the already existing one.

when changing the compilation options, do not forget
to "touch *.c"


2.If you want to change the tables you test with, you have to
update the files convsets.txt and fromToFormats.txt (see above).

3.To those of you who try to read the code: Please notice that
comments with the format /**comment**/ are my on directives to
myself for necessary and possible changes, and not the story of
what really happens in the code...  Besides, for the time being,
and also in the near future the code will be polluted with
comments that may give you little meaning.  Sorry.

4. ISBD punctuation for USMARC output is now (at least partly)
supported. This does not contain punctuation at the end of the
field, as it is assumed that this is taken care of by the
systems themselves.

Since we assume that the records of the formats currently
converted into USMARC follow the AACR2 rules, the USMARC leader
position 18 is set to 'a'.

If, however the results of the punctuation are very
unsatisfactory, the function may be switched off using the
-DMARC_NO_ISBD option prior to compilation. In this case, the
leader position 18 is set to ' ' (blank space), and the 040
field is generated with the subfield $e, containing the word
AACR2 (040 $eAACR2).

5.Reading tables and Freeing table Memory.

If the MARC converter operates with a number of lookup table
sets, it may sometimes be of interest to free one or more of
these tables by format names, and reload them afterwords.
(A conversion set can take up 300kb of memory).

The function
	readConversionSet(char *ffname, char *ttname)
can be used to read a table dynamically by the name of
format. If the global lookup array is used, then the function
is
	lt[lt_ArrayLength]=readConversionSet(ffname,ttname);
so that the table is read into the next free position of
the global array lt.

The function
	freeConversionSet(char *ffname, char *ttname)
frees the conversion set identified by the two format names,
frees its memory and rearranges the global array.

The function
        TABFREE()
Frees all the conversion sets.

6. Line Format file

Since version 2 of MARCconv, there is an option of generating
a line format string, including text explainations of the field
tag meaning. This is implemented by means of an array of
char * pointers that holds the line versions of the records
belonging to a single record stream.

Important flags that are soociated with this function:
readTexts declared in datastruct.h, activated in the
main() routine of the test version (MARCconv.c)

7.Mediated conversion: I guess the reader is aware of the
general loss of data that occurs while converting through a
third format. I just want to point out some things which
are special to the way MARCconv operates:
  1. "999" (private) fields: 999
	$A from format name
        $B copy of the from field
        $C a comment explaining the cause of generation these fields. If
only some of the subfield in the from_field have no conversion - (but they
occur in the table as from sub fields),
	$D The subfield $a of the from  field, to facilitate recognition of partly
	    999ed records.	

  2. creating of 999 fields
  2a. When some subfields in a field have no conversion for the
     to format, a "999" will be created for that part of the
     field, with a subfield "B" that cites the missing
     subfields. When the to_record is then reconverted, the
     999 fields, which are directly converted, may come out
     of context into the 3rd record. For this
     reason, the comment subfield ("C") contains the "to
     field" that will correspond to the "from field" of
     the 2nd record. This will supply the context. Field
     $D will cite the $a field of the from field (1st
     record), so that ambiguity is prevented if field
     tags repeat themselves in the 1st/2nd record.
	
  2b. 999 field will be created when an entire field is
     defined for the FROM format, but not the TO format.

  2c. 999 field will normally NOT be generated for fields that
     are not defined in the from format. A compilation switch
     MARC_999_NON_STANDARD_FIELD enables the 999ing of such fields
	as well.


  3. Loss of data due to differences in granularity between
     formats.
      some pieces of information occupying several sub
      field in one format
      may occupy a single sub field in another
      format. conversion from a less
      granular into a more granular field is not
      solvable if no special characters are used to
      separate the items. An example is some of the note fields
      5xx in NORMARC for which the USMARC counterparts are
      more granular.

      The temporary solution for those was putting the
      information in the
      first of the subfields the data should be distributed
      amongst (i.e.
      the place where the first bit of data should go).


PROGRESS: Here, the changes since the last version are
listed.

current Version: 2.3
Changes since last version:


19/12/96 - Version 2.2
Adding a facility for handling conversion
of linking fields into UNIMARC mini records, enhancing 999
functionality and countless other corrections.


28/10/96 - Version 2.1
Changes since last version:

1. Creating a lineformat of the record to be displayed by the
Z39.50 client.

12/10/96 - Version 2.0

Changes since last version:

1. Enhancing the split sub field function

2. Support for having several (up to 30) conversion sets
   in memory at the same time.

27/9/96 - Version 1.7
1. Support for two new types of tables:
   Conditional converion and indicator conversion
   table

2. Enhancement of the conversion code repertoire
   for subfields.

3. Many other little changes, among which redistributing
   program files.


16/8/96 - Version 1.6
1. Correction in the treatment of indicators.
Addressing the case where indicator 1 converts to
indicator 2 and vice versa.


12/7/96 - Version 1.5
1. Redistribution of the function in the program files
   record_abs.c took over some of the functions that  had
   to do with record abstraction.

2. An implementation of split sub field. It only handles
   cases where
   - only 2 to_sub_fields are involved
   - the split is done on an "isbd character"
     (e.g. ,;:).
   The rest of the cases are submitted to directConvert()
   that copies the entire sub field data into the first
   to_sub_field.


10/6/96

1. Removal of ISBD punctuation when converting applicable recods
   from USMARC is suppored.

Not yet implemented:
- Generating of indicators, where the indicator value must be generated via
  information other than the value of the from-indicators.

- Coded data fields. Needed for converting from UKMARC and from UNIMARC.

- split sub field. For this features no trivial general solution exists.
  For the time being the sub field is directly converted into the first
  of the "to sub fields". The extent of implementation will be decided
  on after some testing has taken place.


24/5/96

1. ISBD punctuation for USMARC output is now (at least partly)
supported. See comments above.

2. Use of global variables is eliminated, which, hopefully makes the
program closer to support concurrent conversion. This has still to be
tested.

Not yet implemented:
- Generating of indicators, where the indicator value must be generated via
  information other than the value of the from-indicators.

- Coded data fields. Needed for converting from UKMARC and from UNIMARC.

- split sub field. For this features no trivial general solution exists.
  For the time being the sub field is directly converted into the first
  of the "to sub fields". The extent of implementation will be decided
  on after some testing has taken place.

-Removal of ISBD punctuation when converting applicable recods from USMARC.

BUGS:
Quite some. The most important known ones:

- On a HPUX 9.01 the program gives a memory fault when compiled
with gcc -O (optimazation).

Important features that yet need improvement.
12/4/96
Just some bugs in the output routine

9/4/96:
1.Decision tables are supported. A decision table supplied
for NORMARC to USMARC (norusm.dcs)
-	A decision table is a mechanism for contents dependend conversion.
2.Indicators are now as well being converted (only when directly
convertible
  from the indicators in the from-fields).

27/3/96:

Changes since last version:
1. Treatment of control fields improved
2. "Line format" files are output with both the source records
and the converted records.
3. Leader updated.
4. Merging of fields corrected


Bugs: Still many, the most important of which are:
1. Incorrect treatment of non-repeatble sub fields. (still treated
as if they were fully repeatable...)
2. Not all conversion functions for variable data fields are supported
3. variable naming is somewhat inconsequent
4. Information in subfields gets overwritten by subsequent information.

25/3/96:

An "old" version with 1 set of hard coded tables.

Bugs: Many




Changes yet to be made with high priority

*Enhancing the split sub field function
	This function is a bit limited by now, as it,
	for most cases only splits into two sub fields.
	
	To treat splitting into more sub fields,
	the function has to be changed into using Posix
	regular expressions with sub string matching.

*Additional error checks in certain positions.	

*Treating MAB
	Since MAB does not use sub fields -
	some changes in the inputting and outputting
	of records are needed.

*Program now treats 1 or 2 indicators in a record.
	Make count of indicators general - driven by
		the value of "number of indicators".

*******************************************************
*                Michael Preminger                    *
*Forsker                            Research Scientist*		
*                                                     *
*H_gskolen i Oslo                Oslo College         *
*              Telefon:+47-22452778                   *
*              Fax:    +47-22452605                   *
*    email:Michael.Preminger@jbi.HiO.no               *
*                                                     *
*******************************************************