Thursday, August 13, 2009

Records for All - Balancing museum records management and archival responsibilities

Francine Snyder, Manager of Library and Archives, Solomon R. Guggenhem Museum

20000+ cubic feet of “archives” records. The reason it’s in quotes: When I interviewed for library position, we talked about archives, but not about records management. I came from orgs where these things were managed by different departments. But later I realized that at Guggen. Archives meant everything sent ot offsite storage.

Is the difference between RM and archives important?

On the screen, letter from Frank L. Wright to trustees. On other side, a box of timesheets. There is a clear difference between these two things. When you’re trained to manage records that are meant to last forever, how do you also manage records with a fixed lifespan.

I began having conversation with staff about their records. They don’t care about this difference. To them, a box is a box is a box. They just want an easy solution for storing and retrieving records.

Thinking about the box is a box is a box premise, we looked at integrating submission process. A key aspect of this was our records schedule. Ours is very basic. We have a general schedule. But when we dug down about departmental schedules, we talked about types of record groups created, and existing departmental guidelines to help figure out what to submit and what not to submit. Using the schedule, we have the same form, the same policies and procedures. This lets us treat all departments as equal with equally important records.

Of course, recognizing differences is important. Once we receive records, there are differences. We verify that new materials actually match paperwork, and then number and enter into Access DB. Traditional treatment.

Likewise for access policies, nonrestricted records are open, whereas restricted are only open to creators.

This is a very simple concept – and many institutions jointly mangeing these things are probably already doing this.

Win, win, win? Biggest challenge is time management. It takes time to do this, and the archivists feel like they would rather be processing collections. On the same thread, it’s easy to spend too much time on records going into RM program and overprocess them.

But this has let us create a unified process. And archivists are always talking about outreach and self promotion. When you work with depts. To create a records schedule, you are sitting down with people who are interested and invested in the process. Also a great time to start talking about the archives and what it is. You even sit down with departments that might seldom use the archives, but it is an opportunity for them to understand it. Legal department is thrilled to have records schedules in place.

The take away: need to define key personnel that need to understand difference between RM and archives programs. Then create detailed policies to assist key personnel n managing these differences. Goal to push out a unified policy to the remainder of staff.

Electronic records management training at the Virginia MFA

Courtney Yevich
Virginia Museum of Fine Arts
Archivist and Asst. Librarian.

Electronic records management.

I’m a big believer in collaborative appraisal. People know their records better than I do. As a state agency, we’re mandated to have an effective RM program – which is a blessing for me. When it comes to transferring to archives, I encourage staff to do sorting filing, etc. themselves.

I adapted the state RM form to make it more user friendly with less archives speak.

I encourage all staff to understand records schedules, fill out own paperwork for destruction (although I check!) etc. Once staff fill out paperwork themselves, they are more proactive and less resistant to destroying recods in a timely mannaer.

In 2006, our governor passed a number of RM insitiatives. That was a perfect opportunity to begin teaching RM to staff in a more concerted way – including electronics management, Risk Management decision to make training mandatory.

Spent days in my office worrying about how NARA was spending millions on electronic RM and they wanted me to do it on the cheap.

Got over that.

Challenges: Part time, lone arranger archivist, no state support for training, and no funding.

Key part – they have to start managing their own electronic records – I am there to help them, but it’s their responsibility. (Also tell them that they could go to jail for 5 years for destroying public records – but that’s just to get their attention.)

The truth is that what I’m teaching isn’t that complex. I am not a computer expert, and have no formal IT expertise. My RM situation was prompted by our own needs.

Skills taught: records legislation, employee responsibuilities, identifying records, retentions schedules, destruction of records, electronic recordkeeping practices, and e-mail management, including mailbox organization and archiving.

Most unexpected outcome is that I am now being pulled into high level discussions about records management at the agency level. That’s a really positive steps. Partly because I continued to horn in on thing at that level, but also because people at that level valued the expertise in this area.

Recently launched new intranet. This is good, but still requires a lot of decisions from humans. Staff will use the skills learned to help make these decisions.

The 100 year old active record – challenges in museum records management

Jane Callahan, Arch and RM at Harvard art museum.

The 100 year old active record – challenges in museum records management:

Vital records that are aging yet still used on a regular basis.

HUAM actually 3 museums together at Harvard. Harvard has central records schedule – there is a special supplement for museums. Recommends long term retention for many things. What happens when these aging permanently active, scattered records are more than 100 years old.

The majority of these records are in charge of curators and collections managers. 3 ring binders, index cards, accession records, ledgers, deeds of gifts, correspondence with donors, treatment reports in conservation. We started acquiring objects in 1895, and the records go back to then. They are used regularly by a number of people.

Condition reports are done for each item received or sent in and out of the museum. These are freely available by users in a number of depts. Including archives. I use them for reference requests. For example, an institution may see loan stickers and want to know why a work was loaned to us in the past. The value of these records and the need to retain them permanently is undisputed, but issue is storage and use.

At what point do these records turn into objects with their own intrinsic value. Most obvisou solution would be to create use copies. If we digitized or entered into existing collections mgmt system, that would work. Now only a small portion have been digitized or Xeroxed. But inconsistent, and has not stopped use – originals used for convenience purpose.

Only acquisitions data is entered into system TMS – so there is much not captured. Visual aspects (even whose handwriting) are often key aspects.

Photocopies are cheaper – hut high quality color would be necessary for some things eg drawings by conservation staff. And this doesn’t answer the issue of distant and simultaneous access.

Ideally we would digitize, prioritizing most important. These could go into digital repository and be centrally mangeed in repository, then linked to collections mgmt system. Originals would go to archives and be rehoused. But time and money are not the only obstacles…

Collections management staff are leery of putting sensitive info like donor info into repository because of confidentiality concerns. This should be possible, but people distrust.

There are also obstacles in reformatting due to nature of records. First starting to learn as much as we can about these permanently active records. There is much to do in terms of records surveys.

Our recent move to a temporary location has helped. While doing records surveys formally and informally, we are learning more and are helping keep track of “rogue” reformatting efforts. (Without proper guidance, these projects are often ineffective. For example, a department photocopied reports, but sent copies to archives and kept originals because “copies not good enough to read.” And they were planning to scan again.)

Ikon is Harvard’s preferred vendor for scanning. If someone contacts them, they contact archivist before proceeding. Harvard also has internal imaging staff.

We are educating staff about Ikon and DRS as we go. Unfortunately most projects are on hold until museum move is complete. We are also negotiating with central records management to make ssure our needs are addressed in general records schedule.

As the records schedule now exists, the archives is unlikely to ever receive a number of records. But we need to think about when the records become artifacts in their own right. We are considering a 50 year cutoff. If we can get this into the central records schedule, that will help show support from the university as a whole. We will also pursue reformatting projects, etc.

Wednesday, August 12, 2009

Sibyl Schaefer on generation of EAD and MARC from shared database at NYU

Going to talk about problem we recognize at NYU and are working to solve. We have ILS system where users can find library materials. However, they don’t usually find finding aids, because not all special collections have MARC records associated with them. We have an entirely different finding aid search system based on Apache Solr. If you go to the lib homepage, you have to specifically select special collections to search them.

We have three special collections bodies that contribute to this one EAD search tool. (Based on Archivist’s Toolkit to generate data.) Load from there into NYU publishing system, gets spit back out in preview, solr indexes it, then it goes online.

If they go to the bobcat ILS, they will not get these. The only definite reason the collections have marc records is if they have to barcode a box to send to offsite storage. Otherwise not required.

In addition – lots of duplication of work. The MARCXML record that comes out of AT is not being used. The EAD finding aid is being handed off to catalogers to generate new MARC from scratch. [DOH!]

Another issue (problem/opportunity) – We just signed onto Ex Libris’ new discovery tool Primo, which lets you pipe information from different areas. How to get more info from databases into library catalogs to enable more power from one search.

AT has been adopted for use within all NYU libraries within last year, so all EAD is now being generated from the toolkit. Have three different instances going, because there are various institutional/poltical reasons.

To address these problems, we set up the AT working group. Consists of me (AT Specialist), the digital library person, the AT programmer (since development is housed at NYU), 3 different catalogers (head of cataloging, plus two others who deal with special collections), tech services lib, electronic resources lib, and point people from three special collections.

Had kickoff meeting in May/June. We came up with vision of AT generating both EAD and MARC at once. Both of these enter respective systems, then eventually hit Primo and get deduped so that both the Marc and the EAD don’t show up in the end result search.

Working through challenges on this now. First, legacy data needs cleanup. Once you put your data into a specific format, you realize that junk goes in, then junk comes out. If you let junky data stay there, it kind of perpetuates. You have an archival collection linking to something that isn’t the authorized form of the name, then it gets spit out again in finding aids and ends up in other places. So we’ve done training classes for grad student assistants to help with searching for authorized form of names, how to enter into AT, how to cleanup existing names, etc.

Starting to tackle subjects now. This is tricky because meaning of different parts of heading is not preserved in EAD headings. (IE MARC subfields) SO you end up with someone having to revisit heading to get it into MARC. We’re working on how to handle this. One idea is to force the dashes in the headings to serve as some sort of delimiter. We have a vendor who is going to be cleaning up authority records, so we’re hoping theymight be able to indicate subfields. The other option is to implement improved subject heading handling in AT, which will help put semantic meaning of terms into AT itself.

The next problem is that MARCXML exports include funky punctuation. So we’re looking at changing some of the AT export code to handle this. But we need to ensure that we don’t break EAD display when we do this.

Also, the location of information. Currently, if you have to encode barcodes at the box level, you have to include it for every single folder. There is a plugin being implemented at yale to solve that, and make location info go into standard location.

And there is a problem because primo uses different fields to dedupe, but right now titles aren’t matching up.

Q: Why won’t XML go into Aleph? It does work via conversion with marcedit, but the problem is there is no connection between the authority databases, so if a cataloger tries to fix a problem in ILS, it doesn’t flow back. And right now they don’t have access to AT.

Q: Is this just the collection level, or lower levels? A: right now it is only collection level, with link to finding aid from the ILS marc record. Both go into Primo, which should search both. One question we have is that you can’t really offer that level of detailed searching without showing them where the terms are in the actual finding aid. We have this worked out for the EAD search, but now have to look at it at a larger level.

Q: Is this public yet? A: No, not yet. Still doing cleanup and planning.

Q: You said you dedupe records to prevent retrieval of two records for the same thing. So which do you show them. A: Probably the MARC record with link to the full finding aid. But brings up the problem I just discussed of how do you show where their search term in the finding aid.

Q: So why wasn’t the MARCXML being used? A: That’s kind of the question that precipitated this working group.

Q: We have a very similar process at Duke for combing EAD and MARC using Endeca. Going to launch soon. We display the full container list in a separate tab, and provide highlighting there. It’s a bit clunky for really large finding aids, but…

Q: What is the difference between primo and aleph – primo sits on top of aleph.

Labels: , , ,

FAÇADE: Future-proofing Architectural Computer-Aided Design

FAÇADE: Future-proofing Architectural Computer-Aided Design

Presented by Tom Rosko, MIT, Head of Institutional Archives and Special Collections at meeting of SAA Architectural Records Roundtable,

At MIT we have the Stata Center, as well as the media lab, architecture school, etc.Applied for IMLS grant a few years ago on how to handle these new digital architectural records.

Current arch. Data is being lost, particularly 3d data.

Staff includes head of arch library, several other people. I was not formally part, was brought in t consult.

Challenge: to develop long term archival strategy for digital archival records, particularly 3D ones. Also to develop strategies for using DSpace to do this. And ways to capture and present data.

The use of 3D CAD has made things increasingly complex in the architectural world. The BIM (Building information modeling) concept has also increased complexity – increases interrelationships between different types and formats of data. And there are not a lot of standards for how these things interoperate.

As project progressed, realized how interrelated data is, and how just preserving the 3D models alone may not be enough.

Architectural firms tend to think more about getting the project done than about long term archiving and reuse. Q: What about the as-builts and other deliverables to clients? Discussion: This is somewhat inconsistent. Contracts are starting to spell this out, with what kind of digital files to deliver. But this is inconsistent. Some also require hard copies, with the idea of scanning that even if the digital files become unreadable.

Tom: Firms are starting to recognize this issue.

The project also looked at potential audiences in addition to Practice (architects, designers, engineers.) for example, researchers, historians, scholars, instructors, students, and general public.

Developed use cases for what uses we thought each type of user might make of the materials. Created advisory board and consulted other audiences.

Content for this project: 3 data sets. Frank Gehry, Stata Center at MIT (2004, CATIA), Moshe Safdie, US Institute of Peace (2009, Revit) and Thom Mayne, Caltrans (2004, Microstation)

These provided 100+ file formats, tens of thousands of gigabytes, almost no metadata, etc. The dataset was massive for each of the projects. They were complete project files, not just the final documents or end products. And some audiences want that stuff, so that played into the mindset of how to develop this.

Geometry – different ways of storing data. For example mesh versus arcs and curves. Parametric allows users to refer to features rather than underlying geometry.

The different software used varied in how they modeled and the geometry methods used.

Looked at open standards for model and geometry info (STEP, IFC, IGES, VRML, STL), as well as for display formats, including 3D PDF.

Various industry exchange data formats.

If CAD software only exports “inert geometry”, it doesn’t truly represent the complexity of the underlying 3d model. Does that matter? To the targeted audiences, it didn’t seem to matter that much. But may matter to us.

How to manage all this data. Intellectually, went with the BIM idea – that unless you incorporate the relationships between different types of info, you will lose information. So used RDF XML ontology to model relationships. Developed Project Information Model, which links together all types of info in a relationship map. (see slide 35)

Slide 36 – properties on objects. Every file gets five properties: Project Phase, Building Zone/System, Architectural Discipline, Document Type, File Format

These were basic tags. More important specially curated documents would get more tags. 3D models and 2D drawing sets, client presentations, etc.

Developed concordance of information. What formats existed, how many of each, and an initial appraisal attempt. (slide 39)

Slide 40, Curators Workbench – a custom tool that allowed someone to go in and view some of the material, make decisions, and add additional appraisal info. Had grad students from school of architecture working on this. Students also converted file formats where needed, and library staff helped with some of the metadata.

DSpace used for preservation, dissemination, and access control. Also used FACDE UI external to DSPACE, bulk ingest tools (curator’s workbench, DSpace packager importer). Format registry used for tracking file formats.

Good diagrams on slides 42ish re: data processes.


For presentation used SIMILE exhibit and timeline tools, and longwell RDF-based faceted browser. Presentation shows screenshot.

[Outcomes. Demonstrates use of open source software to solve this sort of problem. The ontology developed for this project may be applicable to other environments.]

Challenges from an archivists view: When this was just the 3D files, we thought about what are the rights to the material (intellectual property). Not just use of material , but sharing it with the public. When we go beyond drawings to all documentation – there is correspondence in there that may have its own IP issues.

Then there’s the display end - what will work, and how will it work.

And a lot of the other issues we’re routinely dealing with on digital files.

Grant finishing up in Sept, so winding down now. Looking at what to do for follow-up now --- including grant proposal for 3D CAD, and setting up another pilot instance this fall.

Opportunities: tools like curator’s workbench can help us address large “data” files. And interface developments (how to view and search materials) also applicable.

The major hurdles are scalability and IP. How big can it go and how can you sustain it, and what can you show to others – how much will we have to restrict because we just don’t have the answers.

Several questions relating to security implications of having this much construction data searchable in digital repository. Not really addressed in this project.

Q: Seems antithetical to MPLP because you are creating massive amounts of new metadata. A: [Can automate a lot, but there is still a lot of work. ]

Q: Better solution active records management system, so that the complexity can be handled more on the front end?

Q: Is there any data not currently being supplied that should be folded into national CAD standards? This would be a good thing to think about – the AIA is involved in this.

Q: There is similar work in “pedesink” (sp) that oversees STEP standard for longterm retention of CAD, CAM, CAE.

Labels: , ,

Sunday, November 16, 2008

Opening IT Up: Using Open Source Software

Carla Schroer, Cultural Heritage Imaging, (A California Nonprofit Corp.) 20 years of open-source experience.Carla at c-h-i dot org, http://www.c-h-i.org

About me: 20 years in Silicon Valley, 13 at Sun., etc.

Open Source is a licensing model for software. It tells you nothing about the actual software, how it was developed, whether support is available, etc.

Open source licenses have generally been approved by OSI (open source initiative). REquires free distribution, available source code, and people are allowed to modify source code. There are several major types of license:

Permissive Licenses (BSD,MIT,Apache)
Copyleft (Mozilla, GGLGPL, Eurpoean Union Public License (EUPL)

Strong copyleft licenses - Gnu General Public License GPL.

What distinguishes? What you get to do with code downstram. Under a permissive code, I can take and relicense under another license, etc.

A copyleft license, by contrast, I have to release any modifications under same license. Can't go closed/into commercial thing. THere are two flavors. In lightweight, it's file based. So I could fix bugs in one file and release it. (But could release plugin under any license I chose.)

Strong copyleft licenses mean that it is viral. Anything that touches licensed code required to have same license. Strong copyleft licenses are project based and can effect code beyond the original licensed code.

Update: Carla wrote me a note clarifying her point on this:

There is one thing in your description of the licenses part that isn't quite right, and makes a difference to me. This is in the area of strong copyleft licenses. You say:

Strong copyleft licenses mean that it is viral. Anything that touches licensed code required to have same license.

While I think a lot of people believe this to be true, I strive really hard not to go that far in talking about the copyleft effect of the GPL licenses. My slide, and I hope my presentation, said that under some circumstances the copyleft effect can go beyond the original files. I didn't have time to get into the vagaries of this in a short overview, and I recommended some resources for folks that wanted to understand the boundaries a bit further. I think the GPL licenses are really valuable for some situations, and that folks are unduly scared away from it due to fears about the extent of the copyleft effect. It is true that it can affect files that are not part of the original GPL code, but I wouldn't go so far as to say that it affects everything that touches it. (the license talks about "linking with" and also the act of distributing code together can trigger the effect). There are different legal interpretations on what circumstances trigger the requirement, though some things are quite clear.


Permissive license allows much use with few restrictions, Copyleft ensure that work based on license remains open.

These are not all compatible -- you can't combine some of them. Software freedom law center writes a lot about this. There are very few legal precedents, so some is still up in air.

Choosing software: You should ask the same questions you would ask for Open Source as for non-open source. An important thing is what is the cost of switching if it doesn't work out. Also what's the TCO? The cost of the license is one factor. But you still have all the other costs -- training, sustaining, maintenance, etc.

Probably the most controversial thing I'll say this AM. I believe that open source can help mitigate concerns about adopting new file formats and standards. The ability to adopt a new technique is mitigated because the code is open source -- because it's open, it's more likely to be available and usable in future.

CHI Ongoing Collaborations (include Worcester Art Museum, etc.) I'm going t talk about a specific project where we used leadership grant from IMLS and built software tool. Worked with team from UC Santa Cruz. Their standard license for having grad students work requires the U. to have all IP rights. Took some serious negotiation. By contrast , the Italian National Research Council probably wouldn't have been involved unless it was an open source model. An important thing is that everyone knows what the terms will be from the beginning.

Another issue is copyright ownership. So we ended up creating a joint copyright with the people who write the code, which gives us the rights to license and still lets them do stuff with it.

Open Source can be a tool to get people working together for the common good. License terms need to be agreed upon up front. And you need to make sure you choose collaborators with same goals.

Open source is a tool, not a religion. There are a range of licenses, good for different things.
Carla at c-h-i.org, http://www.c-h-i.org

Christopher J. Mackie, Associate Program Officer, REsearch in Information Technology Program, the Andrew W. Mellon Foundation.

Why Open Source? (My views don't represent foundation, if I say something egregiously stupid, it doesn't represent the foundation!)

We think we are the largest NGO funder of open source software. (We'd love to see someone bigger doing that. We have 30+ prokects with users and developers on every continent. http://rit.mellon.org. All major media sites picked up some visualization tools for historical timelines we made for historians and used them in their coverage. All is open source, most is community source.

Open Source isn't always a bunch of people working in their garage in the evenings. It can be big business.

Open source is a licensing scheme (or sometimes a religious belief)l it's not

a guarantee of freedom or success
a sustainability model
a software architecture (in itself)
A single organizational model
a (good) technology strategy in and of itself.

So why use it?

Open source is governed by and for users. Can't be bought, closed. No one can buy project and foreclose it or require upgrade. Ownership is key.

Some people want open source for cost savings. This can happen if you try, but not always primary motivation. Biggest reason for adoption is risk management. It diffuses risk of new development across many institutions. (As long as project is well-run and viable.)

Why not? The Mickey and Judy model. Local optimization/Competency trap. (My mom's got costumes, your dad has a barn, let's put on show!) Bad tech strategy is bad strategy --doesn't matter if open source.

OS tends to cause developer centrism instead of user centrism. Stakeholders want to know what it does for me, and many OS projects have trouble answering that.

OS software brought in badly can subvert an overall technology strategy. Must be smart about it.

OSS Business Model. Most projects don't have one. Some have dual licensing. There is good (dual commercial/noncommercial). If vendor is well intentioned, these can be valuable. Evil is platinum edition, where the OS version is just a crippled version of the good one which is proprietary. That's not really open source.There's the services model, where software is oss but you pay someone to support. There's the appliance model where you are sold a box. There's hosting where the vendor does everything. And then there's software as service. Like hosting, but the vendor sets up internal infrastructure differently.

Varieties of OSS. Traditional is developer driven. Can be a terrific model if customers are developers. But may fail if developers are not final customers.

Single vendor driven - many companies allow you to download, but then try to become a monopoly vendor for it.

Most of the benefits of OSS only come if you can fire the vendor without finring the vendor.

And then there's what we support, which is community source, functionally driven. Collection Space is one of our projects like that.

Community source software is designed and built by and for the functional specialists for their community. We have ways of having these people get together and design a blueprint.Community owned -- when the collection space project is done, it will turn over IP to a foundation. It's community sustained - by contributions, and/or by a healthy vendor ecosystem. And it's state of the art in governance, tech, sociology... and sustainability.

We've done XX many projects, many of which are out of funding, and none of which have died.

What does it take? Not wealth. Vendors make it affordable. Ongoing support costs are 40-60% less than commercial, and vendors allow people to buy support instead of hiring developers. Mostly it takes organizational buy in and a strategic technology plan.

Strategic Plans and critical mass -- CriticalMASS (Mission, agility, sovereignty and sustainability) No strat. tech plan is a plan, but a really bad one.

Strategic Software: Gain buy in from stakeholders and executives. Contextualize CriticalMASS values for your institution. Evaluate resources holistically. If you do all this stuff, we think you'll end up with community source. But if you don't, that's fine.


Carl Goodman, Senior Deputy Director, Museum of Moving Image
Cgoodman at movingimage dot us

PI for Collections Space project
www.collectionspace.org

Getting comfortable with OSS. You're not on your own if you use OSS - you may have more support. The palmolive principle - you're already swimming in OSS everywhere. Wordpress, firefox, drupal, programming languages, Shopping Carts (OSCommerce), course management (Moodle)

Crisis breeds will to try new approaches. People who were more conservative may be willing to try new things.

Our project is not your father's open source project. THis is not people in their free time creating` spaghetti code.

Next generation web bases collection information, management and access platform that happens to be open source. But open source is not the defining point of the software. A number of partners on the project, and funded by Mellon.

Grew out of our work on a homegrown collections system called OpenCollection. Won an award for that. Led to larger grant. It became clear that we needed to step back and reinvent the software and work to create culture and community around it. BAsed on idea that managing, acquiring, disseminating collections is a core activity, as is getting it online, but difficult.

We are developing an alternative to commercial or homegrown CMS.

We want to leverage university partners culture of research and innovations, recognition that they have museums of constitutents. Museums do have developers, but they are booked. And most museums don't.

We're also trying to put UI design up front. The fluid project, which we're working with, is working to create UI elements for incorporation into OSS. We in collections management deserve usable software!

Brought in 40 people representing 20 organizations to think about what a new collections system would look like. Transparency. This is hard for museums. We are doing this in a way so that the community can be involved at any stage and see all debates, discussions, and mistakes. Project website and wiki are very detailed. We're taking a coordinated and highly structured approach to distributed software development. We have 12 people working full time on it, on a 2.5 year project.

We're decoupling various aspects of prokect from each other so that they can innovate and still stay in touch with each other. The functional team is working on the needs requirements for first phase pof project. There is a design UX team looking at user interfaces, use cases, etc.

The technical team is working on underlying architecture and tools.


Timeframe for al this. Tech platform Dec. 08. Development begins Jan. 09. Tire kicker march 09.

Sustainability- we are working on this as well.

THere is a lot of OSS inside the system. JAva/PHP or python or rails, Fedora, etc. Imagemagick, etc. And we hope parts of our project end up inside other projects. For example project OLE at Duke, or Bamboo. Omeka, Pachyderm. Other projects are further along, and have been helpful to us even though in different domain.


Brad Westbrook, Archivist's TOolkit Project Manager, and X at UC San Diego. MLS from UCLA, and MA in English from Suny Albany.

Overview of AT and how it became OSS, and some of our lessons learned. Mary contrasted us with Collection Space, and noted that we're more mature. True to some degree. But also more immature. They are already into sustainability, etc. That's something we came to belatedly.

It's an OSS RDB collection system. Purposely a staff-side tool - not targeted at external clients. Start up funding in 2002 by Digital Library Federation, and two development cycles funded by Mellon Foundation. Three public releases to date. Dec 2006, last january, and last one was wednesday night from this hotel. We are pleased with uptake. We have 40 or so institutions that have implemented as production tool and being available for other users as a resource.

Some institutions include Getty, Museum of flight, vermont folklife center, bates college, Princetion, UCLA, etc.

Designed to address key problems in archives domain. Serialized processing tools. One task done by one tool, another by another. A lot of redundant data entry and siloing, as well as inefficiency and increased training cost.

Also resulted in data with low interoperabiliuty. The online archive of CA found this out in 2001 when they found that the EADs that had been submitted. There was tremendous variability despite being national standard.

Also thought the tool could help reduce growing archival backlogs.

Solution was to build program that would promote standardization (DACS, ISARR (authority recs)
Supports export standards like EAD, MARCXML, METS, MODS, DC)

Also promote efficiently by integrating functions, enabling repurposing of data, automating encoding and reporting, and providing customization features. In the end, we thought it could decrease the cost of processing and improve sharing across community.

So why Open Source? We wanted it to be affordable. We wanted it to be on an enterprise quality database. Oracle and SQL Server were cost barriers for organization. So chose to go with MySQL. ALso, we wanted flexibility and adaptability. One of the complaints is that a vendor will give you SW that does something, but you can't modify it -- you take what you get. So if it meets most of your needs, you take it and adapt. We wanted a tool that orgs could modify. We liked the community volunteer model. Giving the users oversight for development priorities and features request, documentation, etc.

We're released under the Educational COmmunity License 1.0. Did this belatedly. Finally after wrestling with Apache or GPL. But felt ECL was favorably received by funder (we thought) and allowed certain commercial opportunities that might not be there with GPL. USed a lot of third party OSS to build out, and we began transition to user governance. We've shifted from sharing requirements with a small group of specialists to doing it with the community at large.

We worked with SAA to establish workshops for training. THere have been 5 in 2008 so far. We'll begin usability testing using AT users in NYC area. And we petitioned SAA to create AT roundtable, which will hopefully become seat for users to take over governance.

What we've learned? Match between product type and open source strategy. We probably at the beginning thought we'd have developers climbing out of windows hoping to contribute. We've finally come to the conclusion that this is because it's targeted at a very small group of users, and contributing requires domain knowledge. So we will probably not have many developers contributing.

Another thing is to start sustainability planning and think about license much earlier in process. We ended up with a mixture of code with some GPL and some not, which limits what we can do.


Q: How do you measure sustainability and suitability? A: there are some indexes out there like business readiness index. But these only rate big projects, are subjective, etc. SO hard to use. Important to look at how many people involved, how active community is, etc. Ideally you want a community with a diversity of vendors. A project where one institution does most of the work has a single point of failure. Etc.

Q: (Ari) Is there an equivalent for Code4Lib for archives/museums, and/or could this be expanded to code for cultural heritage. To help with developer domain knowledge problem? Bill - aware of code4lib. We've had a presence there due to Mark McKenzie. We've started announcing releases there. How we can take that further.

My comment: Importance of a good plugin architecture, documentation and example code to getting community development. Much harder if you have to understand entire project architecture just to make a small change.

Q: If Firefox dies, you lose your bookmarks. If your collections system dies, you lose everything? A: Disaster planning is important. You have to think about that. One thing people say is "with commercial I can always call someone." But that assumes that they will answer the phone at 5 am on sunday, that they will have an answer quickly, etc. Ideally your disaster plan should be somewhat beyond and independent of just calling someone.

Q: Ari - we started using ATK, but were flummoxed when we realized it didn't have built in integration with Fedora. WHat kind of planning is going into this sort of integration. A: We've put some thought into this, particularly with Fedora and Dspace. Haven't done a ton of planning beyond that - but would wecome community contributions on that. A: CollectionSpace is taking a lot of pains to make sure that we will be open to this type of integration. One of our use cases is Fedora at UC Berkeley. So instead of thinking of as system we're trying to see it as granular and open to these types of cobinations.

Q: You talked about single vendor projects. Can you give an example. A: one good example is Zimbra. If you want to run alongside an exchange server (which is how most people use it) you have to buy from them. 35b went into OSS over last few years, and much of it went into these single vendor solutions.

There's another strategy where marketing material says it's open, and what they really mean is that they have a proprietary API you can write to.

Q: Adobe FLex? Q: Like PDF is open source? A: PDF is an open standard, but not open source in that Acrobat is not open. But the standard is open. That's an important distinction.

Q: I've found that OSS software is held to higher standard than commercial - should be cheaper, offer more flexibility easily, etc. How do you manage this and help with integration and acceptance? A: There tends to be an overselling to leadership. A lot of that is our own fault. People in their zeal to make case overpromise. There isn't a substitute for educating your own leadership. They have to understand what is and isn't deliverable. People go in and say things that they won't be able to follow through on. The easy answer is don't do that. But it's hard, because you're facing the uphill struggle of adopting this new model. The way to short circuit this whole dynamic is to start with a strategic technology plan. If you do that and they're making informed decisions, I expect in most cases with rational people they will understand.

Q: People talk about free as in beer or free as in speech. Well, OSS is free as in kittens. That's a good model to follow.

Q: I come from the other side - our leadership is saying "yeah we need OSS because it's free." But that's a simplistic attitude. Need more strategic planning.

Q: Why aren't strategic technology planning more common? A: Technologists aren't always trained to do this. And at a large organization you learn how to do this. But as CTO in small museum, you may stumble on this, but you're less likely to be trained to do it. And a lot is about opportunity. One reason OSS projects are powerful is that they allow small institutions that don't have resources internally to find resources at the community level and bootstrap themselves. Big orgs often come in for their own reasons -- to demonstrate need for internal developers, etc. Smaller organizations without these resources come in later, but save more because they don't have to do everything internally.

Good end note: need for strategic technology plan!

Labels:

Omeka: Bringing Collections to the Web

Sharon Leon, CHNM, George Mason University


We noticed that people (especially small museums and historical societies) were struggling to bring their collections online. We wanted to build a web publishing system that was targeted at curators, small organizations, etc.

Anyone who is familiar with Wordpress will get the basic idea. The software is based on Dublin Core (unqualified), and supports themes, etc. Last week released the .10 beta with new, redesigned site. We have a growing development community that we'd love for you to join. The API is now set, so those who want to add plugins can now do that. There's a geolocation plugin, an ipaper plugin, a contributed content plugin, etc.

We'd also love people to contribute themes. The system ships with 11 core layouts for exhibit builder. My colleague Sheila Brennan is sitting at the table and has a live demo.

Labels: