File README, Last updated 5/2/97 This is a snapshot directory, containing the up to date version of the MARC converter, with related table files. FILES IN THE DIRECTORY (WHICH USE IS NOT OBVIOUS): recordTest.in contain a set of input records, delimited by (recordTest.<#>) 35(oct) (end of record), and the respective converted records. Note that input to the recordTest.out program may also be given using the functions insertF(), insertSF() either in the code before compilation, or through the debugger when stopped at a special place lineFormat.out contain a "line format" version that displays lineFormat.in the input and output records subsequently, field by field, subfield by subfield. The lineFormat.* files are generated only if the converter is compiled using the -DMARC_DEBUG switch. In addition, The -DMARC_NO_SORT switch may be used to suppress sorting of the output records. In this way corresponding FROM and TO fields are shown the same order, making comparison easier. *.chk copies of the lineFormat.* files with the current setup of the program. For test by diff'ing. fromToFormats.txt For test mode, i.e. convertReocordStream() is run locally. The set that you wish to test. Contains a single pair of formats. convsets.txt The conversion sets to be read. Should conform to the .vdt and .ctl files that are available. e.g. An entry "{UNIMARC} {USMARC}" means that (at least) the files uniusm.vdt and uniusm.ctl are available in the path. format.test*.int Files including large numbers of records for testing The following files are table files, which may or may not be up to date. When downloading new tables, those files are overwritten. fffttt.ctl Conversion table files for control fields and fffttt.vdt variable data fields respectively, where fff are the first three letters of the from-format name, and ttt - first three letters of the to-format name. /*SPESIALTILFELLER*/ fffttt.dcs Decision table files for content dependent conversion. fffttt.cnd Conditional conversion tables, taking care of conditions that need to be met for a conversion to take place fffttt.ind Indicator tables - direction for setting of indicator in nontrivial cases. usmarc.isb A table used to apply/remove ISBD punctua- tion to/from USMARC records. Other files with strange names: Output record files which I sometimes used, and therefore did not yet have the mental strength to delete. dont loose hope though. COMMENTS: 1. Compilation options (Makefile) -DMARC_MULTI_TABLES This option is temporary, and should always be on! Enables loading of more than one conversion set into memory. -DMARC_DEBUG The command line option in the makefile enables the line format files, and some of the stdout error messages. -DMARC_NO_SORT Suppresses the sorting of the output records. It is meant to allow input fields and output fields to be viewed side by side (lineFormat.in/out files). Since the resulting record is not sorted, and hence wrong, this facility is only enabled when the MARC_DEBUG option is active. -DMARC_REDIRECT Allows the program to be run using input_outout redirection instead of using record file names. -DMARC_NO_CONVERT Highly inofficial! allows viewing of long record streams. They are written into the file "lineFormat.in" Only active in the debug mode. -DMARC_NO_ISBD For switching off the ISBD punctuation. Please look at the comment about ISBD punctuation below. -DMARC_999_NON_STANDARD_FIELDS For enabling the creation of 999 fields for input fields that are not a part of the national MARC format, and are therefore not in the table. Normaly, these fields will be ignored by the program. -DAPPEND_TO_SUBFIELD Avoid repeating subfields in an output record field. Append the data of the candidate repeated subfield to the already existing one. when changing the compilation options, do not forget to "touch *.c" 2.If you want to change the tables you test with, you have to update the files convsets.txt and fromToFormats.txt (see above). 3.To those of you who try to read the code: Please notice that comments with the format /**comment**/ are my on directives to myself for necessary and possible changes, and not the story of what really happens in the code... Besides, for the time being, and also in the near future the code will be polluted with comments that may give you little meaning. Sorry. 4. ISBD punctuation for USMARC output is now (at least partly) supported. This does not contain punctuation at the end of the field, as it is assumed that this is taken care of by the systems themselves. Since we assume that the records of the formats currently converted into USMARC follow the AACR2 rules, the USMARC leader position 18 is set to 'a'. If, however the results of the punctuation are very unsatisfactory, the function may be switched off using the -DMARC_NO_ISBD option prior to compilation. In this case, the leader position 18 is set to ' ' (blank space), and the 040 field is generated with the subfield $e, containing the word AACR2 (040 $eAACR2). 5.Reading tables and Freeing table Memory. If the MARC converter operates with a number of lookup table sets, it may sometimes be of interest to free one or more of these tables by format names, and reload them afterwords. (A conversion set can take up 300kb of memory). The function readConversionSet(char *ffname, char *ttname) can be used to read a table dynamically by the name of format. If the global lookup array is used, then the function is lt[lt_ArrayLength]=readConversionSet(ffname,ttname); so that the table is read into the next free position of the global array lt. The function freeConversionSet(char *ffname, char *ttname) frees the conversion set identified by the two format names, frees its memory and rearranges the global array. The function TABFREE() Frees all the conversion sets. 6. Line Format file Since version 2 of MARCconv, there is an option of generating a line format string, including text explainations of the field tag meaning. This is implemented by means of an array of char * pointers that holds the line versions of the records belonging to a single record stream. Important flags that are soociated with this function: readTexts declared in datastruct.h, activated in the main() routine of the test version (MARCconv.c) 7.Mediated conversion: I guess the reader is aware of the general loss of data that occurs while converting through a third format. I just want to point out some things which are special to the way MARCconv operates: 1. "999" (private) fields: 999 $A from format name $B copy of the from field $C a comment explaining the cause of generation these fields. If only some of the subfield in the from_field have no conversion - (but they occur in the table as from sub fields), $D The subfield $a of the from field, to facilitate recognition of partly 999ed records. 2. creating of 999 fields 2a. When some subfields in a field have no conversion for the to format, a "999" will be created for that part of the field, with a subfield "B" that cites the missing subfields. When the to_record is then reconverted, the 999 fields, which are directly converted, may come out of context into the 3rd record. For this reason, the comment subfield ("C") contains the "to field" that will correspond to the "from field" of the 2nd record. This will supply the context. Field $D will cite the $a field of the from field (1st record), so that ambiguity is prevented if field tags repeat themselves in the 1st/2nd record. 2b. 999 field will be created when an entire field is defined for the FROM format, but not the TO format. 2c. 999 field will normally NOT be generated for fields that are not defined in the from format. A compilation switch MARC_999_NON_STANDARD_FIELD enables the 999ing of such fields as well. 3. Loss of data due to differences in granularity between formats. some pieces of information occupying several sub field in one format may occupy a single sub field in another format. conversion from a less granular into a more granular field is not solvable if no special characters are used to separate the items. An example is some of the note fields 5xx in NORMARC for which the USMARC counterparts are more granular. The temporary solution for those was putting the information in the first of the subfields the data should be distributed amongst (i.e. the place where the first bit of data should go). PROGRESS: Here, the changes since the last version are listed. current Version: 2.3 Changes since last version: 19/12/96 - Version 2.2 Adding a facility for handling conversion of linking fields into UNIMARC mini records, enhancing 999 functionality and countless other corrections. 28/10/96 - Version 2.1 Changes since last version: 1. Creating a lineformat of the record to be displayed by the Z39.50 client. 12/10/96 - Version 2.0 Changes since last version: 1. Enhancing the split sub field function 2. Support for having several (up to 30) conversion sets in memory at the same time. 27/9/96 - Version 1.7 1. Support for two new types of tables: Conditional converion and indicator conversion table 2. Enhancement of the conversion code repertoire for subfields. 3. Many other little changes, among which redistributing program files. 16/8/96 - Version 1.6 1. Correction in the treatment of indicators. Addressing the case where indicator 1 converts to indicator 2 and vice versa. 12/7/96 - Version 1.5 1. Redistribution of the function in the program files record_abs.c took over some of the functions that had to do with record abstraction. 2. An implementation of split sub field. It only handles cases where - only 2 to_sub_fields are involved - the split is done on an "isbd character" (e.g. ,;:). The rest of the cases are submitted to directConvert() that copies the entire sub field data into the first to_sub_field. 10/6/96 1. Removal of ISBD punctuation when converting applicable recods from USMARC is suppored. Not yet implemented: - Generating of indicators, where the indicator value must be generated via information other than the value of the from-indicators. - Coded data fields. Needed for converting from UKMARC and from UNIMARC. - split sub field. For this features no trivial general solution exists. For the time being the sub field is directly converted into the first of the "to sub fields". The extent of implementation will be decided on after some testing has taken place. 24/5/96 1. ISBD punctuation for USMARC output is now (at least partly) supported. See comments above. 2. Use of global variables is eliminated, which, hopefully makes the program closer to support concurrent conversion. This has still to be tested. Not yet implemented: - Generating of indicators, where the indicator value must be generated via information other than the value of the from-indicators. - Coded data fields. Needed for converting from UKMARC and from UNIMARC. - split sub field. For this features no trivial general solution exists. For the time being the sub field is directly converted into the first of the "to sub fields". The extent of implementation will be decided on after some testing has taken place. -Removal of ISBD punctuation when converting applicable recods from USMARC. BUGS: Quite some. The most important known ones: - On a HPUX 9.01 the program gives a memory fault when compiled with gcc -O (optimazation). Important features that yet need improvement. 12/4/96 Just some bugs in the output routine 9/4/96: 1.Decision tables are supported. A decision table supplied for NORMARC to USMARC (norusm.dcs) - A decision table is a mechanism for contents dependend conversion. 2.Indicators are now as well being converted (only when directly convertible from the indicators in the from-fields). 27/3/96: Changes since last version: 1. Treatment of control fields improved 2. "Line format" files are output with both the source records and the converted records. 3. Leader updated. 4. Merging of fields corrected Bugs: Still many, the most important of which are: 1. Incorrect treatment of non-repeatble sub fields. (still treated as if they were fully repeatable...) 2. Not all conversion functions for variable data fields are supported 3. variable naming is somewhat inconsequent 4. Information in subfields gets overwritten by subsequent information. 25/3/96: An "old" version with 1 set of hard coded tables. Bugs: Many Changes yet to be made with high priority *Enhancing the split sub field function This function is a bit limited by now, as it, for most cases only splits into two sub fields. To treat splitting into more sub fields, the function has to be changed into using Posix regular expressions with sub string matching. *Additional error checks in certain positions. *Treating MAB Since MAB does not use sub fields - some changes in the inputting and outputting of records are needed. *Program now treats 1 or 2 indicators in a record. Make count of indicators general - driven by the value of "number of indicators". ******************************************************* * Michael Preminger * *Forsker Research Scientist* * * *H_gskolen i Oslo Oslo College * * Telefon:+47-22452778 * * Fax: +47-22452605 * * email:Michael.Preminger@jbi.HiO.no * * * *******************************************************