Saturday, October 29, 2011

Harvesting

It was a strange day in computerland. Although there is no logic to support it, I often find on rainy days that the Internet does not behave like one thinks it should. And south Florida has been in the midst of a very rainy weekend.

According to a long summary on the Open Archives Service Providers site, the stated purpose of Scientific Commons is to create relationships between authors and publications, and it has already done so for 23 million. I was quickly taken to a webpage with a banner saying “Scientific Commons, ” a search box, the option to view the page in either German or English and a list of “Neue Publikationen.” I tried to switch to the English version (which did not happen). I put the search term “climate change” into the search box anyway and waited and waited and waited. Nothing happened. I tried to open some of the “neue publikationens” listed on the homepage, including several with English abstracts. Nothing happened.

The site is deceptively friendly looking for English speakers. I’m not one who believes the entire global community needs to translate everything into English to suit me because English is the language I speak. I know that English in the common language of the scientific community, and that many times only an abstract will be written in English. But I was surprised that nothing opened even when clicking on the links to German articles. It makes it very difficult to review without the ability to see any results, but the bottom line is that the site does not function well, at least on this side of the Atlantic.

The link from Open Archives Service Providers for Callima.org went to a website for addiction, and I didn’t notice anything that looked remotely like a collection of metadata. DigitAlexandria looked promising and impressive on their homepage. About 1,000,000 documents were in their system, with some of the best scientific research centers listed as harvested. The only trouble is that putting a term in the query box and searching resulted in a page opening which had a logo for the Wayback Machine (Beta) and a error message which read: Page cannot be crawled or displayed due to robots.txt. Bizarre.

I finally decided to check Scirus, which I have used before. Mercifully it opened to its familiar search page. It bills itself as a specialty search engine which concentrates on scientific websites. Its “Preferred” websites include; Digital Archives, NDLTD, RePEc, DiVa and multiple Patent Offices. It also iharvests several prominent publishers like Elsevier, Royal Society Publishing, Wiley-Blackwell and Sage. There should be no confusion that these records will lead to the full text; only the metadata records have been harvested with a note linking to the publishers website. Overall I have found Scirus very useful when searching the web for scientific information. Because it searches scientist’s websites and they frequently have links to PowerPoints of their presentations, I have often found graphic-rich material to show students when they are looking for illustrations for their own projects. And today it worked as expected.

Monday, October 24, 2011

Uniit 9

There is such a learning curve creating metadata for my collection. I do not have any background in cataloging other than a mandatory course taken early in grad school. As I work more with the process I am discovering inconsistencies in the items that I create metadata for. I like to think that I am learning as I go, but I realize that I have such a long way to go. I have read over the elements list in Dublin Core and IEEE Learning Objects Metadata. What seems simple at first glance becomes difficult when trying to apply the rules and inexperience gets in the way.


I recently took a workshop on Metadata for ContentDM; I was the only non-technical services person who attended. During discussions and breakout session I could sense the importance to the catalogers for precision, almost to the point of obsession. I am not gifted in that way.

Working as a reference librarian I can appreciate the work that catalogers do. I often feel frustrated by the lack of precision I experience when searching proprietary databases. It seems that in a move to become more Google-like many database providers are attempting to liberally interpret subject terms; so many times lately I will look at a list of results and wonder how a certain item could possiblely be relevant.

Sunday, October 16, 2011

Unit 8 ePrints

I found the degree of difficulty between the installation of ePrints, Drupal and DSpace about the same. However I do think configuration of ePrints, and its customization was more complex and needlessly difficult.


I added to the Welcome message (which wasn’t difficult), and added a logo (which took three unsuccessful tries with method 1, and two tries, ultimately successful, with method 2). I tried to change the theme, although I could not see any differences between glass and green. It also took several hours to get the changes made to the subject taxonomy, but that is probably due to my inexperience in CLI.

It is rather disappointing that there are such few options for customization within ePrints. I credit Drupal with much greater choices for themes and modules. Overall ePrints seemed to have a lack of community support when compared to Drupal and DSapce, although there is plenty of documentation available on their site.

According to Wikipedia, this platform was originally created for institutional repositories and for scientific journals. Its history is tied to the development of the OAI-PMH protocol, and was one of the first open source, free software packages. My ultimate conclusion is that it is very suitable if using it for the purpose it was created—an IR or journal publishing. It however would be my second, or third choice as a repository for other item types.

Sunday, October 9, 2011

Unit 7 Sticker Shock


I work for a community college, not a major research university that demands high tuition from its students, but an institution that is trying to provide an “affordable” education to students who otherwise wouldn’t be able to attend college at all. The faculty’s emphasis is on teaching rather than research, so a digital collection that makes the most sense for us is to have is a learning object repository. I have begun to investigate what is available  in software platforms for a LOR, and in the past two weeks I have spoken to two commercial providers.

The first company I contacted was Equella. It runs the state’s learning object repository, Orange Grove, and appeared very suitable for what my campus would like to do. At present, we have no learning objects, just the dream of creating them. The campus has approximately 10,000 students and, if things take off, the repository would hopefully expand to include the three other campuses in our college system. As I talked to the representative, it became clear that Equella is not scalable down, but is a complete content management system, able to integrate the library management systems, student registration, provide a repository and multiple other things. After a one-time installation ($125,000) and consultation fees ($25,000), the yearly license would be approximately $80,000 a year. Suspicious because Equella is part of the Pearson publishing company, it seemed exorbitant to me.

So I read some other reviews and came across Telescope from North Plains. It advertises itself as scalable and affordable, with an emphasis on digital assets management (DAM) specifically of video and audio files. Speaking with a very nice representative who admitted that funding is often a problem with educational institutions, he was able to lower the yearly license fee down to $100,000 a year. (Gasp) Or we could purchase the software ($150,000), store all files on our own server, and pay a yearly maintenance fee of $30,000 a year. (What a bargain).

I guess I’m naïve; I don’t have much experience with budgets. I really don’t have anything to compare these prices to but our annual book budget of $50,000. But suddenly the open source products—DSpace, Drupal and others we will be looking at in this course are looking very attractive from a financial perspective.

Sunday, October 2, 2011

Unit 6

I was intimidated as I began the installation of DSpace. This was mainly because I had started by reading all the discussion posts in the Tech Activity section of D2L, and it looked as if many people were having installation problems.

The first problem I encountered took me a full day to conquer, and even then it was not the resolution of the problem but a work around. I still do not understand why the second VM would not connect with the same fixed IP address as the first VM. But at least I was able to get DSpace configured using a bridged mode.

During the actually configuration I began to get a sense of what I was doing—not all the time, but enough to feel that I was not just transcribing code from one form to another. I also felt this way last week while adding a new module to Drupal and reformatted previous code to install the new module. This is such a beginning step--I know that for any major undertaking I would be very dependent on a system specialist. But I do feel now that the CLI is not an enemy, I even am beginning to enjoy working with it.

I am grateful to Bruce’s attention to detail when writing our tech assignments. Little notes like “put your host’s name in the brackets” and “there is a space after x” are what this beginner needs.

I looked at the two other installation instructions at SUNScholar and DSpace. I felt as the one from SUNScholar assumed that the audience was an advanced programmer who just needed a quick review on the directions. the instructions at DSpace seemed a little friendlier. I particularly liked that those instructions called for TOMCAT and PostgreSQL to be installed during the installation of Ubuntu. Every little bit helps.