Skip to the content Back to Top

I just have to share this voyage of discovery, because I have wallowed in the doldrums of despair and defeat the last couple of days, only finding the way this morning, in 15 minutes, after sleeping on it. Isn't that always the way?

My Scylla and Charybdis were a client's oral history master and tracks textbases. The master record becomes the primary document in Solr, while the tracks atomically update that document. We've done this before: each track contributes an audio file to the document's list of media. No problem, it's easy to append something new to a primary document.

However, each track also has its own subjects, names and places, depending on the contents of the audio track. These also need to be appended to the primary document. Easy, right? Well, no. It is easy to blindly append something, but you start getting repeats in the primary document. For instance, if the name 'Blackbeard' is in the metadata for 8 out of 10 tracks, the primary document ends up with name=Blackbeard,Blackbeard,Blackbeard,Blackbeard,Blackbeard,Blackbeard,Blackbeard,Blackbeard. You get the picture.

Okay, so let's look in the existing primary record to see if Blackbeard already... oh, wait. You can't get at the existing values while doing an atomic update. Hm.

Ah, we can 'remove' values matching Blackbeard, then 'add' Blackbeard. That should work. And it does. But what about multiple entries coming out of Inmagic like 'Blackbeard|Kidd, William'? Dang it: that string doesn't match anything, so neither name gets removed, and we're back to multiples of each name. We'll need to script a split on the pipe before remove/add.

Split happening: great, great. Now 'Blackbeard' and 'Kidd, William' are going in nicely without duplication. Oh. But wait, what about when multiple textbase fields map to the same Solr field? For example, HistoricNeighbourhood and PlanningArea => place?

And here the tempest begins. It's relatively simple to deal with multiple mappings, or multiple Inmagic entries. But not both. The reason is that now the object representing all the possible values is a Java ArrayList, which doesn't translate perfectly to any javascript type. You can't treat it like an array and deal with the values separately, nor can you treat it like a string and split it to create an array. You can't enumerate it, you can't cast it, it's a black box that is elusive beyond imagining.

Everything I tried, failed. It was dismal. It was all the more maddening because it seemed like it should have been such a simple thing. "Appearances can be deceiving!" shouted the universe, putting its boot-heel to my backside again and again.

Finally this morning, a combination of transformers (including regex) saved my bacon and I am eating the bacon and now I want to lie down for a while, under a blanket made of bacon.

The Technical

I'm using a RegexTransformer to do the splits, THEN a script transformer to remove-and-append.

In Solr DataImportHandler config XML:

 

<entity 
    name="atomic-xml"
    processor="XPathEntityProcessor"
    datasource="atomic"
    stream="true"
    transformer="RegexTransformer,script:atomicTransform"
    useSolrAddSchema="true"
    url="${atomic.fileAbsolutePath}"
    xsl="xslt/dih.xsl"
>
    <!--
        Sequential order of transformers important: regex split, THEN script transform.
        Handles multiple entries plus multiple mappings. E.g.
        <field name="name_ignored">Kyd, William|Teach, Edward</field>
        <field name="name_ignored">Rackham, John</field>
    -->
    <field column="name_ignored" sourceColName="name_ignored" splitBy="\|" />
    <field column="place_ignored" sourceColName="place_ignored" splitBy="\|" />
    <field column="topic_ignored" sourceColName="topic_ignored" splitBy="\|" />

</entity>

 

In Solr DIH script transformer:

 

var atomic = {};

atomic.appendTo = function (field, row) {

    var val = row.get(field + '_ignored');
    if (val === null) return;

    var hash = new java.util.HashMap();
    hash.put('remove', val);
    hash.put('add', val);
    row.put(field, hash);

};

var atomicTransform = function (row) {
    atomic.appendTo('name', row);
    atomic.appendTo('topic', row);
    atomic.appendTo('place', row);    
    return row;
};

 

If you search the Halifax, Nova Scotia public library catalogue for “physiotherapy”, the first record to appear is for an educational pamphlet on “Physiotherapy services in Nova Scotia”  with a link to view it online as a PDF.  Subsequent records in the search results are also patient education pamphlets covering such topics as a guide to going home after surgery, ankle injuries and shoulder-strengthening exercises.

Halifax PL          CapitalHealth2

The Health Sciences Library of Capital Health has recently partnered with Halifax Public Libraries to add hundreds of these hospital-produced patient education pamphlet records to the public library’s catalogue. The goal is to make locally produced current information about health promotion, medical conditions, diagnostic tests, and surgical procedures more accessible to the public. These materials are also freely available and searchable from the website of the Health Sciences Library of Capital Health.

The hospital uses Inmagic DB/TextWorks to maintain the pamphlet database in a non-MARC format. Lara Killian from the Health Sciences Library spoke on the project at the recent CHLA conference in Montreal and described the project. Records are exported into MARC format from DB/TextWorks using a map created with the MARC Transformer available from Inmagic. These records are then massaged using the free MARCEdit software to create a file suitable for loading into the MARC-based AquaBrowser discovery software used by the Public Library.  There were some challenges with the MARC formatting, such as the display of French diacritical marks. At the Public Libraries, Dave MacNeil worked with AquaBrowser to tweak the formatting of the search result display to ensure that when these pamphlets show up, the direct link to the free PDF is easily identifiable.

This new initiative launched in June 2014, with the goal of increasing visibility and usage of the pamphlets by adding this new public access point.

If you need help with a similar project, please contact us for assistance.

In the first post of this series we wrote about cleaning up the files associated with DB/TextWorks and in the second we covered rationalizing your textbase elements.  In this post we’ll discuss some steps you can take to protect and maintain your textbases in good health.

Usually Inmagic DB/TextWorks textbases can function for many years without any intervention or problems. However if you do ever see a “Stop: textbase is in an inconsistent state….” message, please do NOT keep working in it! We have had clients tell us that they just ignore that message not realizing that the textbase might be corrupt. Frequently this message is just caused by a temporary loss of network connectivity while a record is being edited and can be fixed very quickly.

We recommend every so often running Check Textbase from Manage Textbases on a menu imagescreen (i.e. without a textbase open). This will detect and repair problems in the textbase and your user file. The process generally takes just a few minutes for most textbases, but can take a while for very large ones. We suggest specifying Options to Repair Structural Problems and Rebuild 10 or more Damaged Indexes (depending on textbase size). If any problems are found these will be listed in the .chk file with a recommendation for action. Running Check Textbase in this manner will clear the inconsistent state message if it was just caused by a network glitch.

As part of your regular maintenance we also recommend confirming that you have a backup routine for your textbases. We have have heard some horror stories over the years.  Two clients had fires, and two had floods in their buildings.  One of these had no offsite backup and lost several years work.  Another client had all their textbases deleted by an over zealous IT guy who didn’t know what they were and figured they weren’t important, and another client hit batch delete instead of batch modify!  For many of our smaller clients without any IT support you can always simply make a backup by copying your textbases to a USB stick and taking it home with you.

The above information applies to the non-SQL version of DB/TextWorks. Clients with DB/Text for SQL versions should ensure their IT staff are aware of the recommendations in the Administrators Guide available from the Inmagic extranet.

For more information, check out the Help file built in to DB/TextWorks, or the printable PDFfor version 13.   If you run Check Textbase and need help implementing the recommendations, please contact Inmagic Support if you have a maintenance contract, or we can help you on a consulting basis.

In Part 1 of this series of blog posts on spring cleaning your databases,we wrote about the various files created by DB/TextWorks and what was safe to delete.

Now that you have successfully cleaned up the various folders with your textbases, it’s time to turn your attention to the textbase elements, i.e. the query screens, forms and saved sets within your textbases. 

Hopefully you have these!  We hate to find that clients are using the default basic imagequery screen and basic forms when it is possible to create your own very easily. We have watched in horror as clients scroll down long edit screens to add information to a new field they just created which of course appears at the end of their data structure.  We recommend designing query screens and forms with fields placed side by side, grouped under logical headings to allow everything to be viewed at once without any need for scrolling. Additional text boxes can easily be added to all screens to provide helpful search or data entry hints.  So no excuses – try designing some forms – it’s not hard!

On the other end of the spectrum are clients who have created so many forms it’s not obvious which are the ones in common usage.  So they may have Report-Test or Label3 or QBE_Susan etc.  Regular users of the textbase may know which ones are appropriate, but think about our succession planning motive – how can you make it obvious to a new user which they should use?

You can see a listing of all the query screens, report forms and saved sets for a textbase under imageMaintain > Manage Textbase Elements (or Display > Textbase Information to view a printable list).  This list may show more forms than from clicking the Select Form icon, as some may be for printing or web use only.  Most will say (public) after the name – any that do not are visible to you only, and are stored in your personal user file (see Part 1of this spring cleanup series).

Caution:  if you are using WebPublisher PRO you will want to make sure you know which forms are being used in your web interface before any deleting or renaming.   If you are using menu screens or script buttons in your textbase, these too may be set to use specifically named forms. However it’s probably safe to delete ones with names such as test, report1 etc. but if in doubt, before actually deleting forms, we suggest simply renaming them.  They can then be renamed back if it is found they are still in use.   Under Manage Textbase Elements there is a Rename option.  We recommend keeping the same name prefaced with an x.  This means they will drop to the bottom of the list and it is clear that that they are pending deletion at some point.  You can also create a backup of all your forms first by selecting all of them (Shift click) and choosing Export to create an .xpf file.

It is good practice to note additional information in the Description line when you save a form, such as how it is sorted, if it is designed for a specific label size or for a particular function.  This can be invaluable when trying to ascertain years later why a form was created. We also recommend naming your forms consistently starting with an indication of how they are used, i.e. print only forms prefaced with Print as in the screenshot above.

For more information, check out the Help file built in to DB/TextWorks, or the printable PDFfor version 13. If you don’t feel comfortable doing this cleanup yourself or would like assistance designing forms or query screens, contact us and we can help you on a consulting basis.

Our next post in this series will cover maintaining your textbases in good health.

What might happen if you win the lottery?  Will your successor be able to understand how to use your Inmagic DB/TextWorks textbases?  In this series of posts we’ll help you rationalize your files, textbases and forms plus provide suggestions for regular maintenance of your textbases.

In this Part 1 of the series we discuss how to cleanup folders and directories which may have become cluttered with multiple copies and backups of textbases and related files.  Here therefore are some tips to help you figure out what is safe to delete.

What are all those files and what do they do?

Each DB/TextWorks database consists of a number of files; how many depends on whether you have the version for a non-SQL or SQL platform.  The SQL version, (file extensions shown in parentheses below), uses Microsoft SQL Server as the data store for the actual records. 

Do NOT delete any of the following:

.acf  (.cac)Access Control File – controls simultaneous access to the textbase
.btxTerm and Word indexes
.dboDirectory to the records in the .dbr file
.dbr   Contains the records
.dbs  (.cbs)   Textbase structure file with field definitions
.ixl   Indexed list file with any validation and substitution lists
.log  (.log)   Log file of any changes to records or the textbase structure
.occ   Lists of records indexed in the .btx file
.sdo   Directory to any records with deferred updates
.tba  (.cba)   Primary textbase definition file plus elements such as forms and query screens.
.tbm   Menu screen files

On a network install, you may also have .slt files which show who has a textbase open.  If you have a thesauri there will be .tml files, which prevent more than one person at a time modifying thesaurus records.   You may also have an .ini file for some applications.

What can I get rid of?

Generally the following are temporary working files created as you perform various functions:

.chkReport created after running Check Textbase
.dmpExported records
.x01 etc.Exception files from imports
.tbb (.cbb)Exported textbase structure definition
.xpf or .xpqExported forms and query screens

These can usually be safely deleted unless there is a need to keep backups of the records or forms at some point in time.  If so, we suggest moving these files to a specific backup folder named appropriately to indicate the date and purpose.

How can I tell if it’s an old or defunct textbase?

We suggest doing a search across your network files for all *.tba or *.cba (SQL version) files.  You can use the Search or Find tool in Windows Explorer for this. This can have surprising results if you’ve had DB/TextWorks for many years!  It’s easy to create a new textbase or make a copy of an existing one to test out a new idea, but all too often these tests are never deleted.  Usually once you open these textbases you can search and see if there are only a few records. If there is no automatic date created field, we suggest looking at the log file to determine how long ago data was last added or modified to help determine current usage. For clients with multiple users and multiple textbases, we have a sample database inventory textbase to help you document this cleanup process. Contact us if you are interested in obtaining a copy - it’s free for existing clients.

What about all these .tbu, .tbs and .idi files?

These are all User Files and are specific to each person who is using each textbase in DB/TextWorks.  The .tbu contains “private” textbase elements such as forms and saved sets.   The .tbs file stores scripting information and the .idi file stores your last used settings, such as the window size and position, and your most recent batch modification or import settings.
Ideally these should be stored in a personal User Files folder on the network for each user so that there are no conflicts and to ensure that they are backed up.  You can also store them on your PC workstation if it’s backed up. However if you want to keep these settings you’ll need to remember to copy those files over if you get a new PC.

You can easily move these personal user files to a more appropriate location under Tools > Options. We highly recommend checking where they are now located for each active user and rationalizing these settings.   You can then safely delete any remaining .tbu, .tbs and .idi files if they are currently cluttering up your textbases folders.

For more information on any of these files, check out the Help built into DB/TextWorks or the printable PDFfor version 13. If you don’t feel comfortable doing this cleanup yourself, contact us and we can help you on a consulting basis.

Spring Cleanup Part 2:  Rationalizing textbase elements

Let Us Help You!

We're Librarians - We Love to Help People