Copac Beta Interface

We’ve just released the beta test version of a new Copac interface and I thought I’d write a few notes about it and how we’ve created it.

Some of the more significant changes to the search result page (or “brief display” as we call it) are:

  • There are now links to the library holdings information pages directly from the brief display. You no longer have to go via the “full record” page to get to the holdings information.
  • You can see a more complete view of a record by clicking on the magnifying glass icon at the end of the title. This enables you to quickly view a more detailed record without having to leave the brief display.
  • You can quickly edit your query terms using the search forms at the top of the page.
  • To further refine your search you can add keywords to the query by typing them into the “Search within results”  box.
  • You can change the number of records displayed in the result page.

The pages have been designed using Responsive Web Design techniques — which is jargon that means that the HTML5 and CSS have been designed in such a way that the web page rearranges itself depending on the size of your screen. The new interface should work whether you are using a desktop with a cinema display, a tablet computer or a mobile phone. Users of those three display types will see a different arrangement of screen elements and some may be missing altogether on the smaller displays. If you use a tablet computer or smartphone, then please give beta a try on them and let us know what you think.

The CGI script that creates the web pages is a C++ application which outputs some fairly simple, custom, XML. The XML is fed through an XSLT stylesheet to produce the HTML (and also the various record export formats.) Opinion on the web seems divided on whether or not this is a good idea; the most valid complaints seem to be that it is slow. It seems fast enough to us and the beta way of doing things is actually an improvement as there is now just one XSLT used in creating the display, whereas our old way of doing things used multiple XSLT stylesheets run multiple times for each web page. Which probably just goes to show that the most significant eater of time is the searching of the database rather than the creation of the HTML.

Auto-complete considered harmful?

Behind the scenes we’ve been creating new versions of Copac that use relational database technology (the current version of Copac doesn’t use a relational database.) It’s a big change which has kept me busy for a long time now. One of things we thought it would be nice to do with all this structured data is to have fields on our web search forms offer suggestions (or auto-complete) as the user types.

It turned out that implementing auto-complete was very easy thanks to JQuery UI. Below is a screen shot (from my test interface) showing the suggestions that auto-complete offers after typing “sha” in the author field.

The suggestions are ordered by how frequently the name appears in the database. So in the screen shot above, “Shakespeare, Willian, 1564-1616” is the most frequently occurring name starting with the letters “sha” in my test database.

(By the way, these example screen shots are from a test database of about 5 million records selected in a very un-random way from from seven of our contributing libraries.)

Having done the Author auto-complete I started thinking about how we would present suggestions for a Title auto-complete popup. It didn’t seem useful to present the user with an alphabetical list of titles, neither did it seem much more useful to present the most commonly occurring titles. I thought we could relatively easily log which records users view and then present the suggestions ranked according to how often a title has been viewed.

Then I thought that if a user has already selected an author from the Author auto-complete suggestions, it only makes sense to suggest titles that are by the selected author. For example, a user has selected Shakespeare from the author auto-complete suggestions. They then type “lo” in the title field. It would be pointless and counter-intuitive to list “Lord of the Rings” in the title suggestions; what we should show is “Love’s Labour’s Lost”.  But then, by the time you’ve created that list of suggestions for the user you’ve pretty much done their search for them already. So why not just show them the search results straight away? Google are doing this now with their Instant search results. Well as hip and sexy as that sounds I don’t think we can go there. For a start I don’t think we have the compute horsepower to make it as instant as Google do and there are fundamental data problems which make it very hard for us to do well.

So, going back to the Author auto-suggestions, lets look what happens when I type “tol” in the author field:

Again, the author suggestion look very nice, but unfortunately the list contains Leo Tolstoy twice: at the top of the list as “Tolstoy, Leo, graf, 1828-1910” and at the bottom of the list as “Tolstoy, Leo”. That’s because there’s no consistent Authority Control across our ~60 contributing libraries (and then there’s all the typos to consider.).

There’s two ways we can turn a user selection from an auto-complete list into a search.

  1. We can turn the author name into a keyword search.
  2. Each of those names in the list has a unique database ID and we can search for records that have that author-ID.

If we do 2.) then selecting one form of the name Leo Tolstoy will only find records with that exact form and wont find records that have the second (or third or fourth) form of the name. This will give the search a lot of precision but the recall is likely to be terrible.

If we do 1.) then the top ranking “Tolstoy, Leo, graf, 1828-1920” will only find a subset of our Tolstoy records. As there are a substantial set of records that don’t include “graf, 1828-1910” a keyword search including those terms will miss those records entirely. If the user selected “Tolstoy, Leo” from the list they will likely find all the Leo Tolstoy records in the database (except those catalogued as “Tolstoy, L.” and those records with typos.) The user may wonder why the name variant that finds most records is listed 10th, while the name listed first finds only a subset.

Maybe we could get around these problems by only using the MARC $a subfield from the 100 and 700 tags. (The examples above are using 100 $a$b$c$d.) Doing that would remove all the additions to names such as “Sir” and the dates. That would probably be okay for authors with distinctive names, but could merge lots of authors with common names. It would reduce search precision and increase recall.

So far I’ve only considered auto-complete on author and title fields. The Copac search forms have many fields and I’m not sure we have the facilities or compute power to inter-relate all the auto-complete suggestions so that the user only sees suggestions that make sense according to the fields the user has already filled in.

If we could inter-relate all the fields on our search forms we would probably know the search result before the user hit the search button. So what would be the point of having a search button anyway? That brings us back to the Google Instant search type of interface.

What should we do?

  • We could just not bother trying to inter-realte the auto-complete suggestions and let users select mutually incompatible suggestions. (Which seems rather unhelpful.)
  • We could not do auto-complete at all. (Again, this seems un-helpful at first sight, but may be better as the auto-complete seems to effect an increase in search precision which may not be useful against a database containing very variable quality data.)
  • We could have just a single field on our search form. (Much easier to program, but not what our users tell us they want.)
  • Just offer auto-complete on a two or three fields and inter-relate them. (To make this work I think we’d have to make the suggestions as imprecise as we can without them being a waste of space.)

I hope the above ramblings make some sense. If anyone has thoughts on this issue we’d like to hear your views.

Yesterday’s loss of service

I thought I’d write a note about why we lost the Copac service for a couple of hours yesterday.

The short of it is, that our database software hung when it tried to read a corrupted file in which it keeps track of sessions. The result was that everyone’s search process hung and so frustrated users kept re-trying their searches, which created more hung sessions until the system was full of hung processes and with no CPU or memory left. Once we had deleted the corrupted file, everything was okay.

The long version goes something like this… From what I remember, things started going pear-shaped a little before noon when the machine running the service started becoming unresponsive. A quick look at the output of top showed we had far more search sessions running than normal and that the system was almost out of swap space.

It wasn’t clear why this was happening and because the system was running out of swap it was very difficult to diagnose the problem. It was difficult to run programs from the command line as, more often than not, they immediately died with the message “out of memory.” I did manage to shutdown the web server in an effort to lighten the load and stop more search sessions being created. It was proving almost impossible to kill off the existing search sessions. In Unix a “kill -9” on a process should immediately stop the process and release its memory back to the system. But yesterday a “kill -9” was having no effect on some processes and those that we did manage to kill were being listed as “defunct” and still seemed to be holding onto memory. In the end we just thought it would be best to re-boot the system and hope that it would solve whatever the problem was.

It took ages for the system to shut itself down – presumably because the shutdown procedures weren’t working with no memory to work in. Anyway, it did finally reboot and within minutes of the system coming up it became overloaded with search sessions and ran out of memory again.

We immediately shut down the web server again. However, search sessions were still being created by people using Z39.50 and so we had to edit the system configuration files to stop inetd spawning more Z39.50 search sessions. Editing inetd.conf didn’t prove to be the trivial task it should have been, but we did get it done eventually. We then tried killing off the 500 or so search sessions that were hogging the system — and that proved difficult too. Many of the processes refused to die. So, after sitting staring at the screen for about 15 minutes, unable to run programs because there was no memory and wondering what on earth do we do now, the system recovered itself. The killed off processes did finally die, memory was released and we could do stuff again!

A bit of investigation showed that the search processes weren’t getting very far into their initialisation procedure before hanging or going into an infinite loop. I used the Solaris truss program to see what files the search process was reading and what system calls it was making. Truss showed that the process was going off into cloud cuckoo land just after reading a file the database software uses to track sessions. So I deleted that file and everything started working again! The file got re-created next time a search process ran — presumably the file had become corrupted.

Issues searching other library catalogues

Some of you may have noticed that there is now a facility on the Copac search forms to search your local library catalogue as well as Copac. You’ll only see this option if you have logged into Copac and are from a supported library.

The searching of the local library catalogues and Copac is performed using the Z39.50 search protocol. Due to differences in local configurations the query we send to Copac and the various library catalogues have to be configured very differently.

When we built the Copac Z39.50 server, we tried to make it flexible in the type of query it would accept within the limitations imposed upon us by the database software we use. Our database software was made for keyword searching of full text resources. As such it is good at adjacency searches, but you can’t tell it you want to search for a word at the start of a field.

Databases built around relational databases tend to be the complete opposite in functionality. They often aren’t good at keyword searching, but find it very easy to find words at the start of a field.

The result of which is that we make our default search a keyword search, while some other systems default to searching for query terms at the start of a field. Hence if we send the exact same search to Copac and a library catalogue we can get a very different result from the two systems. To try and get a consistent result we have to tweak the query sent to the library so that it performs a search as near as possible to that performed by Copac. Working out how to tweak (or transform or mangle) the queries is a black art and we are still experimenting.

Stop word lists are also an issue. Some library systems like to fail your search if you search for a stop word. Better systems just ignore stop words in queries and perform the search using the remaining terms. The effect is that searching for “Pride and prejudice” fails on some systems because “and” is stop worded. To get around this we have to remove stop words from queries. But we first need to know what the stop words are.

The result is that the search of other library systems is not yet as good as it could be, though it will get better over time as we discover what works best with the various library systems that are out there.

Logging in to Copac: some tips

Now that you have the option to log-in to Copac to use the personalisation features, here are some tips to make logging in as easy as possible.

Typekey/Typepad:  if you have a Typekey or Typepad account, and were wondering where your login option was, worry no longer!  From the drop-down list of organisations on the login page, you need to choose ‘JISC project: SDSS (TypeKey Bridge)’.  It’s not immediately obvious, but it is the correct login option for any TypeKey users.

Navigating the list:  the list of organisations is very long, and weighted heavily towards ‘U’.  To navigate it more easily, you can jump straight to any letter by typing it on your keyboard.  You may find it even easier to enter a keyword search in the search box.  This will work for partial words as well – entering ‘bris’ will give you the options of the City of Bristol College and the University of Bristol.

Remembering your selection:  once you have found your organisation, there are options to have your selection remembered, either for that session (the default) or for a week.  You can also choose ‘do not remember’, which is especially useful if you are on a public computer.

Please contact us if you experience any problems with logging in to Copac.

New Copac interface

It’s finally here!  After months of very hard work from the Copac team, and lots of really useful input from users on the Beta trials, the new Copac interface is now live.

We have streamlined the Copac interface, and you can still search and export records without logging in to Copac. This is ideal if you want to do a quick search, and don’t need any of the additional functionality.  Users who choose not to login will still be able to use the new functionality of exporting records directly to EndNote and Zotero, and will see book and journal tables-of-contents, where available.

You now also have the option to login to Copac.  This is not compulsory, and you only need to login if you want to take advantage of the full range of new personalisation features.   These have been developed to help you to get the most out of Copac, and to assist in your workflows.

‘Search History’ records all of your searches, and includes a date/time stamp.  This allows you to keep track of your searches, and to easily re-run any search with a single click.

‘My References’ allows you to manage your marked records, and create an annotated online bibliography.

You can annotate and tag all of your searches and references.  There is no limit to how you can use this functionality:  see my post from March for some suggestions about how you might use tags and annotations.  We would love to hear how you are using them – please get in touch if you would like to share your experiences and ideas.

Users from some institutions will now have the option to see their local catalogue results appearing alongside the Copac results.  We are harvesting information from the institutions’ Z39.50 servers, and using this to create a merged results set.  If you are interested in your institution being a part of this, please get in touch.

Some people have expressed concern that the need to login means that Copac is going to be restricted to members of UK academic institutions only.  This is not the case.  We are committed to keeping Copac freely-accessible to all.  Login is required for the new features to function:  we need to be able to uniquely identify you in order to record your search history and references; and we need to know which (if any) institution you are from to show you local results.  We have tried to make logging in as easy as possible.  For members of UK academic institutions, this means that you can use your institution’s central username/password, or your ATHENS details  For our users who aren’t members of a UK academic institution, you can create a login from an identity provider: ProtectNetwork and TypePad.  These providers enable you to create a secure identity, which you can use to manage access to many internet sites.

We are very grateful to everyone who has taken the time to give us feedback on the recent Beta trials.  But we can never get enough feedback!  We’d love to hear what you think about the new Copac interface:  you can email us; speak to us on twitter; or leave comments here.

Copac Beta can search your library too

One of the new features we are trailing in the new Copac Beta is the searching of your local institutions library catalogue alongside Copac. To do this we need to know which Institution you are from and whether or not your Institutional library catalogue can be searched with the Z39.50 protocol.

To identify where you are from, we are using information given to us during the login process. When you login, your Institution gives us various pieces of information about you, including something called a scoped affiliation. For someone logging in from, say, the University of Manchester, the scoped affiliation might be something like “student@manchester.ac.uk”

Once we know where you are from, we search a database of Institutional Z39.50 servers to see if your Institution’s library is searchable. If it is we can present the extra options on the search forms, and indeed, fire off any queries to your library catalogue.

Our database of Z39.50 servers is created from records harvested from the IESR. So, if you’d like your Institution’s catalogue available through Copac, make sure it is included in the IESR by talking to the nice people there.

Many thanks to everyone who tried the Beta interface early on and discovered that this feature mostly wasn’t working. You enabled us to identify some bugs and get the service working.

Beta login issues

Users from some Institutions had been unable to login in Copac Beta. Thanks to help from colleagues we think we have now resolved the issue which was related to an exchange of security certificates between servers. The result was that a handful of Institutions were not trusting us and so were not releasing the anonymised username that we require. This seems to be fixed now and we’ve noticed that users from those Institutions can now login.

So, if you tried to login to Copac Beta and received a “Login failed” message, please try again. And please let us know if you still can’t get access.

Copac Beta tweaks

[Jargon alert] I just noticed that the Shibboleth TargetedIds (read anonymised usernames) we are seeing when people log into Copac Beta are much longer than I expected. Some of them may have become truncated when saved in our user database. So, I’ve just increased the field size in the database. This may mean that some people will have lost their search history and references. Sorry about that. But a Beta test version is all about finding out about such niggles.

Thanks for persevering everyone.

[Added 30/3/2009] Tagging in ‘My references’ currently broken. We’re looking into it and hope to have it fixed soon.

[31/3/2009] Tagging issues now look to be fixed. However, when I was looking through the logs I spotted another problem which may reveal the reason some people are having difficulties searching Beta. Investigations are in progress.

[1/4/2009] It looks like some people are unable to gain access to Copac Beta because their Identity Provider isn’t providing us with an anonymised userid, or to use the jargon, a TargetedID. We do need this so we know which is your Search History and References.

There’s not a lot we can do about this. It is up to your Institution to release the TargetedID to us. However, if you are getting a “Login Failed” message please contact us, telling us which Institution you are from and we’ll try hassling your system admins.

New features of the Copac Beta Interface

With the new Copac interface, we wanted to make the Search History and Marked List (now re-named My References) more useful. Previously, these features were session based — that is, if you re-started your web browser, your search history and saved records were lost. For us to be able to retain that data over multiple sessions, we need to know who our users are. Hence, for Copac Beta we are forcing you to login.

The advantage of logging in is that you can use Copac Beta from multiple machines at different times and still have access to the searches and references you saved yesterday or last week – or even last year.  Unfortunately, log-in is currently restricted to members of UK Access Federation institutions (most UK HE and FE institutions, and some companies), but don’t worry – there will always be a free version of Copac open to everyone, and we will be widening the log-in scope in the future.

You can tag your searches and references and use a tag cloud to see those items tagged with a particular tag. We are automatically tagging your saved searches and references with your search terms, and you can remove these automatic tags, and add your own.  These tags are then added to your tag cloud, so that you can easily navigate your saved records through tags which are meaningful to you.  Why would you want to delete the automatically generated tags?  Well, records are tagged with all of your search terms so, if you limit your search to ‘journals and other periodicals’, the tags for records from that set will include ‘journals’ ‘other’ and ‘periodicals’.  If you find these confusing, you can just delete them, and have only tags that have meaning for you.

You can also add notes to any of your references – perhaps to remind yourself that you have ordered the item through inter-library loan, and when you should go and pick it up, or perhaps to make comments about how useful you found the item.  This ‘My References’ section was developed as part of the JISC-funded project Discovery to Delivery at Edina and Mimas (D2D@E&M) as a Reusable Marked List workpackage.

You can also edit the bibliographic details of the item.  These edited details are only visible to you, so you don’t have to worry about making any changes.  You could use this to correct a typo or misspelling in the record, or add details that are not visible in the short record display, such as information about illustrations or pagination.

The search history feature allows you to re-run any previous search with a single click, from any screen.  This could be especially useful for anyone who is doing demos, as not only do you know that the search will return results, but it saves you from the jelly fingers that haunt the even the most proficient of typists when in front of an audience.  The date and time of previous searches are recorded, so that you can see what you have searched for and when.  This could be useful for tracking the progress of a project over time, or showing at a glance what effect refining a search has on the number of results.

Many journal records now contain the latest table-of-contents.  Clicking on an article title will take you through to the Zetoc record for that article, and from there you can use the Open-URL resolver to link directly to full-text (if your institution has access), or order the article through your institution or directly from the British Library.  The table-of-contents allows you to get an idea of the scope of the journal, and whether it will be of interest to you, without going to another website. This makes it easier to avoid wasted travel or unnecessary inter-library loan requests.

We’d love to know what you think of these new features – and any suggestions you might have for new ones!  Once you’ve used the new features, please fill-in our questionnaire, to help us learn what we’re doing right, and what you’d like to see changed.  As thanks for your feedback, there’s a £35 Amazon voucher up for grabs for one lucky respondent.  The survey has 9 questions, and shouldn’t take more than 10 minutes of your time.  Of course, you can always give us additional feedback through the comments on this blog, by emailing copac@mimas.ac.uk, by phone or post, or Twitter.  But we’d really like you to do the survey as well 🙂