Thursday, August 13, 2009

Records for All - Balancing museum records management and archival responsibilities

Francine Snyder, Manager of Library and Archives, Solomon R. Guggenhem Museum

20000+ cubic feet of “archives” records. The reason it’s in quotes: When I interviewed for library position, we talked about archives, but not about records management. I came from orgs where these things were managed by different departments. But later I realized that at Guggen. Archives meant everything sent ot offsite storage.

Is the difference between RM and archives important?

On the screen, letter from Frank L. Wright to trustees. On other side, a box of timesheets. There is a clear difference between these two things. When you’re trained to manage records that are meant to last forever, how do you also manage records with a fixed lifespan.

I began having conversation with staff about their records. They don’t care about this difference. To them, a box is a box is a box. They just want an easy solution for storing and retrieving records.

Thinking about the box is a box is a box premise, we looked at integrating submission process. A key aspect of this was our records schedule. Ours is very basic. We have a general schedule. But when we dug down about departmental schedules, we talked about types of record groups created, and existing departmental guidelines to help figure out what to submit and what not to submit. Using the schedule, we have the same form, the same policies and procedures. This lets us treat all departments as equal with equally important records.

Of course, recognizing differences is important. Once we receive records, there are differences. We verify that new materials actually match paperwork, and then number and enter into Access DB. Traditional treatment.

Likewise for access policies, nonrestricted records are open, whereas restricted are only open to creators.

This is a very simple concept – and many institutions jointly mangeing these things are probably already doing this.

Win, win, win? Biggest challenge is time management. It takes time to do this, and the archivists feel like they would rather be processing collections. On the same thread, it’s easy to spend too much time on records going into RM program and overprocess them.

But this has let us create a unified process. And archivists are always talking about outreach and self promotion. When you work with depts. To create a records schedule, you are sitting down with people who are interested and invested in the process. Also a great time to start talking about the archives and what it is. You even sit down with departments that might seldom use the archives, but it is an opportunity for them to understand it. Legal department is thrilled to have records schedules in place.

The take away: need to define key personnel that need to understand difference between RM and archives programs. Then create detailed policies to assist key personnel n managing these differences. Goal to push out a unified policy to the remainder of staff.

Electronic records management training at the Virginia MFA

Courtney Yevich
Virginia Museum of Fine Arts
Archivist and Asst. Librarian.

Electronic records management.

I’m a big believer in collaborative appraisal. People know their records better than I do. As a state agency, we’re mandated to have an effective RM program – which is a blessing for me. When it comes to transferring to archives, I encourage staff to do sorting filing, etc. themselves.

I adapted the state RM form to make it more user friendly with less archives speak.

I encourage all staff to understand records schedules, fill out own paperwork for destruction (although I check!) etc. Once staff fill out paperwork themselves, they are more proactive and less resistant to destroying recods in a timely mannaer.

In 2006, our governor passed a number of RM insitiatives. That was a perfect opportunity to begin teaching RM to staff in a more concerted way – including electronics management, Risk Management decision to make training mandatory.

Spent days in my office worrying about how NARA was spending millions on electronic RM and they wanted me to do it on the cheap.

Got over that.

Challenges: Part time, lone arranger archivist, no state support for training, and no funding.

Key part – they have to start managing their own electronic records – I am there to help them, but it’s their responsibility. (Also tell them that they could go to jail for 5 years for destroying public records – but that’s just to get their attention.)

The truth is that what I’m teaching isn’t that complex. I am not a computer expert, and have no formal IT expertise. My RM situation was prompted by our own needs.

Skills taught: records legislation, employee responsibuilities, identifying records, retentions schedules, destruction of records, electronic recordkeeping practices, and e-mail management, including mailbox organization and archiving.

Most unexpected outcome is that I am now being pulled into high level discussions about records management at the agency level. That’s a really positive steps. Partly because I continued to horn in on thing at that level, but also because people at that level valued the expertise in this area.

Recently launched new intranet. This is good, but still requires a lot of decisions from humans. Staff will use the skills learned to help make these decisions.

The 100 year old active record – challenges in museum records management

Jane Callahan, Arch and RM at Harvard art museum.

The 100 year old active record – challenges in museum records management:

Vital records that are aging yet still used on a regular basis.

HUAM actually 3 museums together at Harvard. Harvard has central records schedule – there is a special supplement for museums. Recommends long term retention for many things. What happens when these aging permanently active, scattered records are more than 100 years old.

The majority of these records are in charge of curators and collections managers. 3 ring binders, index cards, accession records, ledgers, deeds of gifts, correspondence with donors, treatment reports in conservation. We started acquiring objects in 1895, and the records go back to then. They are used regularly by a number of people.

Condition reports are done for each item received or sent in and out of the museum. These are freely available by users in a number of depts. Including archives. I use them for reference requests. For example, an institution may see loan stickers and want to know why a work was loaned to us in the past. The value of these records and the need to retain them permanently is undisputed, but issue is storage and use.

At what point do these records turn into objects with their own intrinsic value. Most obvisou solution would be to create use copies. If we digitized or entered into existing collections mgmt system, that would work. Now only a small portion have been digitized or Xeroxed. But inconsistent, and has not stopped use – originals used for convenience purpose.

Only acquisitions data is entered into system TMS – so there is much not captured. Visual aspects (even whose handwriting) are often key aspects.

Photocopies are cheaper – hut high quality color would be necessary for some things eg drawings by conservation staff. And this doesn’t answer the issue of distant and simultaneous access.

Ideally we would digitize, prioritizing most important. These could go into digital repository and be centrally mangeed in repository, then linked to collections mgmt system. Originals would go to archives and be rehoused. But time and money are not the only obstacles…

Collections management staff are leery of putting sensitive info like donor info into repository because of confidentiality concerns. This should be possible, but people distrust.

There are also obstacles in reformatting due to nature of records. First starting to learn as much as we can about these permanently active records. There is much to do in terms of records surveys.

Our recent move to a temporary location has helped. While doing records surveys formally and informally, we are learning more and are helping keep track of “rogue” reformatting efforts. (Without proper guidance, these projects are often ineffective. For example, a department photocopied reports, but sent copies to archives and kept originals because “copies not good enough to read.” And they were planning to scan again.)

Ikon is Harvard’s preferred vendor for scanning. If someone contacts them, they contact archivist before proceeding. Harvard also has internal imaging staff.

We are educating staff about Ikon and DRS as we go. Unfortunately most projects are on hold until museum move is complete. We are also negotiating with central records management to make ssure our needs are addressed in general records schedule.

As the records schedule now exists, the archives is unlikely to ever receive a number of records. But we need to think about when the records become artifacts in their own right. We are considering a 50 year cutoff. If we can get this into the central records schedule, that will help show support from the university as a whole. We will also pursue reformatting projects, etc.

Wednesday, August 12, 2009

Sibyl Schaefer on generation of EAD and MARC from shared database at NYU

Going to talk about problem we recognize at NYU and are working to solve. We have ILS system where users can find library materials. However, they don’t usually find finding aids, because not all special collections have MARC records associated with them. We have an entirely different finding aid search system based on Apache Solr. If you go to the lib homepage, you have to specifically select special collections to search them.

We have three special collections bodies that contribute to this one EAD search tool. (Based on Archivist’s Toolkit to generate data.) Load from there into NYU publishing system, gets spit back out in preview, solr indexes it, then it goes online.

If they go to the bobcat ILS, they will not get these. The only definite reason the collections have marc records is if they have to barcode a box to send to offsite storage. Otherwise not required.

In addition – lots of duplication of work. The MARCXML record that comes out of AT is not being used. The EAD finding aid is being handed off to catalogers to generate new MARC from scratch. [DOH!]

Another issue (problem/opportunity) – We just signed onto Ex Libris’ new discovery tool Primo, which lets you pipe information from different areas. How to get more info from databases into library catalogs to enable more power from one search.

AT has been adopted for use within all NYU libraries within last year, so all EAD is now being generated from the toolkit. Have three different instances going, because there are various institutional/poltical reasons.

To address these problems, we set up the AT working group. Consists of me (AT Specialist), the digital library person, the AT programmer (since development is housed at NYU), 3 different catalogers (head of cataloging, plus two others who deal with special collections), tech services lib, electronic resources lib, and point people from three special collections.

Had kickoff meeting in May/June. We came up with vision of AT generating both EAD and MARC at once. Both of these enter respective systems, then eventually hit Primo and get deduped so that both the Marc and the EAD don’t show up in the end result search.

Working through challenges on this now. First, legacy data needs cleanup. Once you put your data into a specific format, you realize that junk goes in, then junk comes out. If you let junky data stay there, it kind of perpetuates. You have an archival collection linking to something that isn’t the authorized form of the name, then it gets spit out again in finding aids and ends up in other places. So we’ve done training classes for grad student assistants to help with searching for authorized form of names, how to enter into AT, how to cleanup existing names, etc.

Starting to tackle subjects now. This is tricky because meaning of different parts of heading is not preserved in EAD headings. (IE MARC subfields) SO you end up with someone having to revisit heading to get it into MARC. We’re working on how to handle this. One idea is to force the dashes in the headings to serve as some sort of delimiter. We have a vendor who is going to be cleaning up authority records, so we’re hoping theymight be able to indicate subfields. The other option is to implement improved subject heading handling in AT, which will help put semantic meaning of terms into AT itself.

The next problem is that MARCXML exports include funky punctuation. So we’re looking at changing some of the AT export code to handle this. But we need to ensure that we don’t break EAD display when we do this.

Also, the location of information. Currently, if you have to encode barcodes at the box level, you have to include it for every single folder. There is a plugin being implemented at yale to solve that, and make location info go into standard location.

And there is a problem because primo uses different fields to dedupe, but right now titles aren’t matching up.

Q: Why won’t XML go into Aleph? It does work via conversion with marcedit, but the problem is there is no connection between the authority databases, so if a cataloger tries to fix a problem in ILS, it doesn’t flow back. And right now they don’t have access to AT.

Q: Is this just the collection level, or lower levels? A: right now it is only collection level, with link to finding aid from the ILS marc record. Both go into Primo, which should search both. One question we have is that you can’t really offer that level of detailed searching without showing them where the terms are in the actual finding aid. We have this worked out for the EAD search, but now have to look at it at a larger level.

Q: Is this public yet? A: No, not yet. Still doing cleanup and planning.

Q: You said you dedupe records to prevent retrieval of two records for the same thing. So which do you show them. A: Probably the MARC record with link to the full finding aid. But brings up the problem I just discussed of how do you show where their search term in the finding aid.

Q: So why wasn’t the MARCXML being used? A: That’s kind of the question that precipitated this working group.

Q: We have a very similar process at Duke for combing EAD and MARC using Endeca. Going to launch soon. We display the full container list in a separate tab, and provide highlighting there. It’s a bit clunky for really large finding aids, but…

Q: What is the difference between primo and aleph – primo sits on top of aleph.

Labels: , , ,

FAÇADE: Future-proofing Architectural Computer-Aided Design

FAÇADE: Future-proofing Architectural Computer-Aided Design

Presented by Tom Rosko, MIT, Head of Institutional Archives and Special Collections at meeting of SAA Architectural Records Roundtable,

At MIT we have the Stata Center, as well as the media lab, architecture school, etc.Applied for IMLS grant a few years ago on how to handle these new digital architectural records.

Current arch. Data is being lost, particularly 3d data.

Staff includes head of arch library, several other people. I was not formally part, was brought in t consult.

Challenge: to develop long term archival strategy for digital archival records, particularly 3D ones. Also to develop strategies for using DSpace to do this. And ways to capture and present data.

The use of 3D CAD has made things increasingly complex in the architectural world. The BIM (Building information modeling) concept has also increased complexity – increases interrelationships between different types and formats of data. And there are not a lot of standards for how these things interoperate.

As project progressed, realized how interrelated data is, and how just preserving the 3D models alone may not be enough.

Architectural firms tend to think more about getting the project done than about long term archiving and reuse. Q: What about the as-builts and other deliverables to clients? Discussion: This is somewhat inconsistent. Contracts are starting to spell this out, with what kind of digital files to deliver. But this is inconsistent. Some also require hard copies, with the idea of scanning that even if the digital files become unreadable.

Tom: Firms are starting to recognize this issue.

The project also looked at potential audiences in addition to Practice (architects, designers, engineers.) for example, researchers, historians, scholars, instructors, students, and general public.

Developed use cases for what uses we thought each type of user might make of the materials. Created advisory board and consulted other audiences.

Content for this project: 3 data sets. Frank Gehry, Stata Center at MIT (2004, CATIA), Moshe Safdie, US Institute of Peace (2009, Revit) and Thom Mayne, Caltrans (2004, Microstation)

These provided 100+ file formats, tens of thousands of gigabytes, almost no metadata, etc. The dataset was massive for each of the projects. They were complete project files, not just the final documents or end products. And some audiences want that stuff, so that played into the mindset of how to develop this.

Geometry – different ways of storing data. For example mesh versus arcs and curves. Parametric allows users to refer to features rather than underlying geometry.

The different software used varied in how they modeled and the geometry methods used.

Looked at open standards for model and geometry info (STEP, IFC, IGES, VRML, STL), as well as for display formats, including 3D PDF.

Various industry exchange data formats.

If CAD software only exports “inert geometry”, it doesn’t truly represent the complexity of the underlying 3d model. Does that matter? To the targeted audiences, it didn’t seem to matter that much. But may matter to us.

How to manage all this data. Intellectually, went with the BIM idea – that unless you incorporate the relationships between different types of info, you will lose information. So used RDF XML ontology to model relationships. Developed Project Information Model, which links together all types of info in a relationship map. (see slide 35)

Slide 36 – properties on objects. Every file gets five properties: Project Phase, Building Zone/System, Architectural Discipline, Document Type, File Format

These were basic tags. More important specially curated documents would get more tags. 3D models and 2D drawing sets, client presentations, etc.

Developed concordance of information. What formats existed, how many of each, and an initial appraisal attempt. (slide 39)

Slide 40, Curators Workbench – a custom tool that allowed someone to go in and view some of the material, make decisions, and add additional appraisal info. Had grad students from school of architecture working on this. Students also converted file formats where needed, and library staff helped with some of the metadata.

DSpace used for preservation, dissemination, and access control. Also used FACDE UI external to DSPACE, bulk ingest tools (curator’s workbench, DSpace packager importer). Format registry used for tracking file formats.

Good diagrams on slides 42ish re: data processes.


For presentation used SIMILE exhibit and timeline tools, and longwell RDF-based faceted browser. Presentation shows screenshot.

[Outcomes. Demonstrates use of open source software to solve this sort of problem. The ontology developed for this project may be applicable to other environments.]

Challenges from an archivists view: When this was just the 3D files, we thought about what are the rights to the material (intellectual property). Not just use of material , but sharing it with the public. When we go beyond drawings to all documentation – there is correspondence in there that may have its own IP issues.

Then there’s the display end - what will work, and how will it work.

And a lot of the other issues we’re routinely dealing with on digital files.

Grant finishing up in Sept, so winding down now. Looking at what to do for follow-up now --- including grant proposal for 3D CAD, and setting up another pilot instance this fall.

Opportunities: tools like curator’s workbench can help us address large “data” files. And interface developments (how to view and search materials) also applicable.

The major hurdles are scalability and IP. How big can it go and how can you sustain it, and what can you show to others – how much will we have to restrict because we just don’t have the answers.

Several questions relating to security implications of having this much construction data searchable in digital repository. Not really addressed in this project.

Q: Seems antithetical to MPLP because you are creating massive amounts of new metadata. A: [Can automate a lot, but there is still a lot of work. ]

Q: Better solution active records management system, so that the complexity can be handled more on the front end?

Q: Is there any data not currently being supplied that should be folded into national CAD standards? This would be a good thing to think about – the AIA is involved in this.

Q: There is similar work in “pedesink” (sp) that oversees STEP standard for longterm retention of CAD, CAM, CAE.

Labels: , ,