Where librarians and the internet meet: internet searching, Web 2.0 resources, search engines and their development. These are my personal views and not those of CILIP or any other organisation I may be associated with.
October 13, 2011
A library is not…
a building. Sure, there are some lovely wonderful buildings which house libraries, and we don’t have to go back too far to see when the building that housed a library was essentially a temple of worship to the book. However, while a library needs a building (although I’m not going too far down that route any longer, since a case can easily made that it’s no longer true), it can’t define the library. Sure, it can help with the concept of a library, and it can assist in the role of the library – they used to be quiet buildings with loud rooms, but now they’re more often than not a loud building with quiet rooms, but a building full of books, neatly arranged with helpful people doing things for the members/clients/etc could quite easily be a bookshop.
A library is not a collection of books. It’s also not a collection of resources either. We cannot define ourselves by the artifacts that we use. We should – hopefully – have long gone beyond that – into other media to begin with, but then, as society has started to leave physical objects behind with the increased use of music files instead of CDs and films on demand instead of DVDs and knowledge ‘in the cloud’ instead of on CD-ROM, so has the the library and the librarians. We’re not in the book business – we have *never* been in the book business. We’re in the knowledge business, helping, assisting and facilitating what our members and our communities want. However, and this will raise a few hackles I’m sure – perhaps we’ve not done it as loudly or as obviously as we should. For many reasons – librarians are not well known for being self publicists and for shouting what they do from the rooftops, and perhaps because in our job we seek consensus and agreement rather than discord and disagreement. If it is seen that the principle role of the library/librarian is to maintain a collection, then we become defined BY that collection. The argument then turns into one of ‘what will happen if we get rid of the collection?’, rather than ‘can our community manage without the input of a librarian?’ At that point, people will say that they can manage without, because there are bookshops (although in decreasing numbers), charity shops (God help us) or Amazon or Google, for those lucky enough to have access to the net.
Problems arise when the library/librarians are not seen as part of the backbone of a community. Once this happens, it becomes logical to think of cutting it. The decisions of councils and mayors with little brain are a total puzzlement, when viewed in the light of how we see libraries. They see them as a resource which isn’t part of a community. We have an insane situation where a community is apparently forced to choose between having a library and caring for its elderly and deprived. There are a few points worth making here – firstly, it’s the role and responsibility of an elected body to run services on behalf of the community that elected them, and it’s not for them to try to abrogate responsibility back to the community, either in terms of ‘you want it you run it’ or in terms of ‘if you don’t let us do x, y will happen’. The very idea that if we don’t close libraries we have to cut social care is patently ridiculous. I would be the first to agree that a council has to prioritise, and things like hospitals, firestations, police are towards the top of the list. However, we don’t have hospitals, firestations and police stations on every corner, because at some point other things come into play. In order to have a healthy community we have to have a varied community and that includes a variety of social amenities. A better, more logical discussion might be ‘do we want a library space, or do we want a swimming pool’, although obviously a better discussion would be along the lines of a rather grander economic discourse on what the Government is or is not doing to the country as a whole.
A second point is that a library service, which is able to provide resources, artifacts and knowledge to a community does fulfill a social need and requirement. Without getting too hysterical about it, while a hospital or a day centre can be used to keep a body going, a library service keeps a mind going. In that respect, a library service is just as important as a health service – because both services are aimed at doing the same thing – keeping a society or a community safe and healthy – they’re just dealing with different elements.
So, if a library is not a building, and it’s not books or other artifacts, what is it? I’ve already said that librarians cannot and shouldn’t be defined by what we work with (if that was the case we’d all have very dim views of greengrocers who sell vegetables!), but rather by what it is that we achieve. We should be defined by the effect that we have on our society and our communities. Because really, what we do, what we’re involved with, is the knowledge business as I’ve said, and that actually equates to the power business. I often say that I wanted to be a librarian because I wanted the power, and while it’s fun to hear an audience laugh, it’s also quite sad, because clearly they often don’t see it the way that I do. Our role is not found on our shelves, in our computers, in our buildings or even in our history, but in what we DO. And that isn’t ‘stamp out books’. That’s defining us, once again, in terms of the artifacts that we may (or increasing may not) use.
Every single librarian does something special, and it doesn’t matter if they’re in a school, public library, academic, prison, commercial – any. We help to or perhaps even inspire people to read, we help people get jobs, we change lives. We make a community better. We make a community better, and yes I did repeat that, because it’s important. We help protect free speech, we help provide people with hope, and I don’t make any excuse for using such hyperbole, because it needs to be heard. It needs to be shouted. A while ago I wrote a piece on ‘What Librarians Do and what Google does’. Someone suggested that Google did good things as well, and librarians were also on the lookout for money. That’s completely missing the point, because the whole reason for librarians is to work for their members and their communities by facilitation, and by providing good, valuable credible information to better and improve what people do. If Google does this, it’s a nice sideline from their goal of making money.
Librarians are here to help their communities, and an attack on a library is an attack on a community. It may not seem like it, and clearly to a lot of councillors it doesn’t, but that’s exactly what it is. Because it’s saying that the benefit that people get from their libraries/librarians in terms of learning to read, in getting a job, in finding social services to protect them in some way, in giving people the opportunity to learn or indeed just enjoying a good book – none of that matters. And when they say that none of that matters what they’re actually saying is ‘that community doesn’t matter’ and ‘that person isn’t important’.
Libraries and librarians are not a community ‘bolt on’ service. They are an integral part of a community, they help represent a community and they contribute to the health of a community. That’s why cuts to libraries are so dangerous – not just because they deprive people of access to resources, or jobs, or information or pleasure, but because they say ‘You don’t matter. You are not important.’ That’s not a good thing.
10/06/2011 From comes the following article on SOCIAL metadata
OCLC Report Examines Use of Social Metadata at Libraries, Archives, and Museums
OCLC Research released a new report titled “Social Metadata for Libraries, Archives, and Museums, Part 1: Site Reviews.” The report seeks to provide an overview of social metadata to enable cultural heritage institutions to better use their users’ expertise and enrich their descriptive metadata to improve their users’ experiences.
Metadata helps users locate resources that meet their specific needs. But metadata also helps us to understand the data we find and helps us to evaluate what we should spend our time on. Traditionally, staff at libraries, archives, and museums (LAMs) create metadata for the content they manage. However, social metadata—content contributed by users—is evolving as a way to both augment and recontextualize the content and metadata created by LAMs. Many cultural heritage institutions are interested in gaining a better understanding of social metadata and also learning how to best utilize their users’ expertise to enrich their descriptive metadata and improve their users’ experiences.
In order to facilitate this, a 21-member RLG Partners Social Metadata Working Group reviewed 76 sites relevant to libraries, archives, and museums that supported such social media features as tagging, comments, reviews, images, videos, ratings, recommendations, lists, links to related articles, etc. In addition, working group members surveyed site managers, analyzed the survey results and discussed the factors that contribute to successful—and not so successful—use of social metadata. They also considered issues related to assessment, content, policies, technology, and vocabularies.
This report includes an environmental scan of 76 social metadata sites and a detailed review of 24 representative sites. It is the first of three OCLC Research reports about social metadata. The second report will provide an analysis of the results from a survey of site managers, and the third report will provide recommendations on social metadata features most relevant to libraries, archives, and museums as well as the factors contributing to success.
Learn more about the OCLC Research project associated with the report, Sharing and Aggregating Social Metadata
Source: OCLC Research
08/31/2011 From Library Door Blogspot on these suggestions were published to assist students in getting better use of Google. Do you agree? Should we try to stop the search engine tidal wave that the students have grown up with? Or is there a compromise? Librarians what do you think?
Monday, August 22, 2011
Healing the Search-Impaired
“Regardless of the advanced-search capabilities of the database they were querying, “Students generally treated all search boxes as the equivalent of a Google search box, and searched ‘Google-style,’ using the ‘any word anywhere’ keyword as a default,”
Or, ”Unfortunately, professors are not necessarily any more knowledgeable about library resources than their students are.” The conclusions also mentioned that students felt disconnected from the librarians. The message for teacher-librarians is two-fold:
- to bridge the librarian disconnect, and
- teach students to search Google, even if you don’t like it. (Deal with it and embrace the advanced search.)
Our local librarians do a great job of both bridging the disconnect and teaching.
If students are swimming in Google, we have to throw them a life preserver. While this article does a good job of pointing out what the issues are, it did not offer a great deal of advice for teaching proper searching techniques to students. Each librarian out there probably has a few good models — tricks of the trades so-to-speak– which have worked well.
There was an excellent article published a few months ago in Multimedia & Internet@Schools which really offered some basic techniques for success. It is more important to read the answers than to just read about the problem. Read below one paragraph from this article entitled, How Google Works: Are Search Engines Really Dumb and Why Should Educators Care? By Paul Barron Jan 1, 2011 “Using Google to Hook Students
Educators know that libraries provide access to more relevant information sources and that there are specialists in libraries who enjoy helping students with their research projects. The challenge is influencing the students to use the resources.
Students’ preference to begin their research with Google provides opportunities for educators to integrate the databases hosted in the school library into their research. After teaching a student to use the advanced search features in Google, educators can show how, with minimal modifications, Google’s advanced search syntaxes are similar to the features provided by the library’s proprietary databases. After teaching students to search using Google’s advanced search options, an effective leading question is to ask the student, “Would you like me to teach you a search method that saves you time, provides more relevant resources, and that will improve the quality of your research and earn you a higher grade?”
This approach works! Lori Donovan, a teacher-librarian at Thomas Dale High School from Chester, Va., noted: “I revised my lesson plan for teaching students how to search the Web and library databases. Students were frustrated using the Web; when we got to Gale and ABC-CLIO, their amazement in the difference of the quality of information was priceless. One student researching working women of the 1930s said, ‘Google is aggravating; I found much more in Student Resource Center.’”
Or, this piece:
Helping Google—Crafting Queries Using Advanced Search Syntaxes
The search query is the only control that a searcher wields over a search engine. However, librarians know that the predominant difficulty students experience while performing web-based research is conceptualizing the search topic and constructing effective search strings. 24 The inability to construct appropriate search statements limits a student’s success in searching for relevant information.
Unfortunately, most students have not learned that they can influence the accuracy of the search results by stating a search query at an adequate level of detail to help the search engine grasp the intent of the query. 25 The remedy is to first gain an understanding of how search engines work and then craft queries to exploit the factors Google considers when ranking sites, such as the importance of the web page title and the top-level domain of sites. Search engines users should also heed Greg Notess’ dictum that the more words you search for, the smaller and more refined your results list will be for the search query. 26 Also, the more words used in the query, the less likely that Wikipedia will be at the top of the results, if returned at all.
I dare not copy and additional text, less I border on being information ill-literate and copy a “material” part of the article. Please read the full text . This year, as school begins, make it a goal to equip our students for success. Come up with a slogan to drive the message. This generation loves slogans. They work.
“Recipe for Success – 4 words or more will give you a good score”
“Avoid the lies, narrow your search with 5″
You can probably come up with a better one– Or, better yet, let your students create one!
8/12/2011 News from The LOC in their Catalogablog: Library cataloging, classification, metadata, subject access and related topics.
Wednesday, August 10, 2011
Image via WikipediaLC has announced that new vocabulary data has been added to the LC Authorities and Vocabularies Service.
The Library of Congress is pleased to make available additional vocabularies from its Authorities and Vocabularies web service (ID.LOC.GOV), which provides access to Library of Congress standards and vocabularies as Linked Data. The new dataset is:
- Library of Congress Name Authority File (LC/NAF)
In addition, the service has been enhanced to provide separate access to the following datasets which have been a part of the LCSH dataset access:
- Library of Congress Genre/Form Terms
- Library of Congress Children’s Headings
The LC/NAF data are published in RDF using the MADS/RDF and SKOS/RDF vocabularies, as are the other datasets. Individual concepts are accessible at the ID.LOC.GOV web service via a web browser interface or programmatically via content-negotiation. The vocabulary data are available for bulk download in MADS and SKOS RDF (the Name file and main LCSH file will be available by Friday, August 12).
Please explore it for yourself at http://id.loc.gov.
The new datasets join the term and code lists already available through the service:
- Library of Congress Subject Headings (LCSH)
- Thesaurus of Graphic Materials
- MARC Code List for Relators
- MARC Code List for Countries (which reference their equivalent ISO 3166 codes)
- MARC Code List for Geographic Areas
- MARC Code List for Languages (which have been cross referenced with ISO 639-1, 639-2, and 639-5, where appropriate)
- PREMIS vocabularies for Cryptographic Hash Functions, Preservation Events, and Preservation Level Roles
- Linking to LCSH and LCC (catalogablog.blogspot.com)
- AALL 2011 – Barbara Tillett and John Mark Ockerbloom on Authority Control Vocabularies and the Semantic Web(conniecrosby.blogspot.com)
8/12/2001 In an article from the State Bar of Wisconsin Bev Butula discusses Google
Google searchers beware: Features come and features go
By Bev Butula, manager of library services, Davis & Kuelthau, Milwaukee
Aug. 3, 2011 – We have all heard that the only thing constant is change. Technology and website functionality, prime examples of this point, continue to evolve – particularly with Google. Many Google changes are so seamless that the researcher may not even notice new or different features when they happen. For example, many people cannot remember when Google Instantlaunched. However, it’s second nature now, and people now expect the answer they seek before they finish typing the question.
Google continues to revolutionize the web search process by developing new tools and enhancing its current products. Google is the most-used search engine and, as a librarian, I encourage users to take advantage of its special search features. Likely, there are features available that you are not aware of, so click on the link and explore.
Recently, it seems that Google is modifying or eliminating features at a faster pace than in the past.
Find your missing features
“Advanced search” and “define” are two features that recently relocated.
Google’s advanced search option is an excellent tool to help refine and control your search results. Advanced search still exists, but now it is a bit more difficult to find. Until recently, accessing the advanced search option was a simple click of the “advanced search” button next to the main search box. Read a post by Mark Rosch of Internet for Lawyers about the disappearance of the “advanced search” option from the main search page and find out where it is.
Another recent search modification involves the Google “define” syntax. Researchers used to be able to type “define:” followed by a concept or word, into the main Google search box and instantly retrieve online definitions. Users can now access a sidebar search option entitled“dictionary.” Gary Price at INFOdocket summarizes the changes in a May post.
Some features simply go away
If you are looking for “wonder wheel,” Uncle Sam, and GOOG-411, look no more. They are gone and soon to follow are Google Health and Google Labs.
It is not a new phenomenon for Google to shut down products or incorporate them into new technologies. Larry Page once said, Google’s mantra is “…to do the best things we know how for our users, for our customers, for everyone.” While we do not know what new features Google has in development right now, we can safely assume they will make make a difference in how we do our work.
About the author
Bev Butula is the manager of library of services at Davis & Kuelthau, Milwaukee. She is a past president of the Law Librarians Association of Wisconsin. Betula has written articles and spoken to numerous groups on issues such as effective Internet research, evaluation of Web sites and legal research. Prior to obtaining her Master’s Degree in Library Science from UWM, Betula was a litigation paralegal.
August 1, 2011 In a new blog to this weblog called the Mod Librarian, this info from the Univ of Oregon is presented here:
Check out this lovely metadata scheme for art and architecture images created by the University of Oregon Libraries Digital Images Initiative. Part crosswalk and part data dictionary, this simple scheme (UO-AAI) combines the best of VRA Core and Dublin Core. Authorities are clearly delineated with the usual suspects of ULAN, TGN and AAT at the forefront and it provides a solid framework for anyone seeking to manage a digital visual resource collection.
Ostensibly, this scheme was created to manage the rich resources available at UO’s Art and Architecture Library. Collections range from those originating from Oregon libraries like Oregon Digital, a joint effort by OSU and UO to the standard issue Artstor or Oxford Art Online.
MOD LIBRARIAN:topics for the modern library: digitization, metadata, taxonomy, social media, design for seamless user experience, marketing and sometimes, just books.
7/7/11 In this posting Jessica Hagman discusses how Google uses our personal searching habits to determine what “hits” it is going to provide similar to what Amazon does when it makes recommendations due to past purchases. Except in Google it is “invisible.” Decisions are being made for us. Kind of scary, isn’t it? But it is important to be aware of.
Thinking About ‘The Filter Bubble’
This month’s post in our series of guest academic librarian bloggers is by Jessica Hagman, Reference and Instruction Librarian at Ohio University. She blogs atJess in Ohio.
Last fall, I taught a one-credit learning community seminar. During the week where we discussed research and library resources, I showed the class this video from Google, describing how the search engine works. I suspected that most students had no idea how links come to the top of a Google search results page and no basis on which to begin evaluating the results beyond page rank, a suspicion confirmed by research from the Web Use Project(previously discussed here on ACRLog).
Yet, when I asked whether the video surprised them or if the search engine process was different than they had previously thought, I heard the proverbial crickets. Finally, one student spoke up with a shrug, “I guess I’ve just never thought about it before.” While I probably shouldn’t have been surprised that few students spent time thinking about the mechanics of Google, it was startling to hear it stated so clearly.
I thought about this comment again a few weeks ago when I ran across a link to Eli Pariser’s TED Talk “Beware Online Filter Bubbles.” In the talk and his new book elaborating on the subject Pariser argues that companies like Facebook and Google use the data we share online to build a personalized bubble around each person in which they only encounter information, news and links that confirm their already established world view and assumptions. And while the bubble is pervasive, it is mostly invisible.
After watching the talk, my thoughts turned to the undergraduate researcher writing about a contentious social issue like gun control or abortion whose browser history limits the scope of the results they see on Google. I’ve discussed Google searching in many library instruction sessions, but it’s usually been to point out the poor quality of some of the search results and to encourage students to look beyond the first link. Starting in the fall, I will mention the personalization of search results as well, so that students are at least aware that their search results reflect more than just the keywords they searched.
The implications of the filter bubble may go beyond the research for a freshman composition paper, however. In the later chapters of his book, Pariser argues that the pervasiveness of filter bubbles may hinder learning, creativity, innovation, political dialogue, and even make us more susceptible to manipulative advertising. It’s difficult to discuss these consequences in a one-shot library instruction session, but to know that the bubble exists is a powerful first step to escaping it when necessary.
I will be teaching the learning community seminar again this fall, and this year I will show them Pariser’s talk. While I think it’s important that they be aware of personalized search and its potential implications, I’m also very curious to hear what students think about personalized search and a world of filtered information. While they may not have spent much time thinking about Google in the past, I hope that seeing the video will encourage them to think about how their own search history and browsing data affect what see – or do not see – online.
This infographic on email’s history appeared on the blog Mashable
Email, you’ve come a long way, baby.
In its 40-year tenure as a form of communication, email has run its course from the domain of über nerdy computer scientists to one of the most common ways to keep in touch, both personally and professionally.
Although email as a mode of communication was around for ten years before the term “email” was actually coined, we now count on it in our daily lives. In fact, the use of email has become so pervasive that the Oxford English Dictionary recently added a slew of email acronyms to its official canon.
And finally, just this year, the AP Stylebook, a.k.a. the holy book of all (or most) journalists, amended the spelling of e-mail to email, allowing articles such as this one to save bigtime on hyphens.
To give you a timeline of email’s progress through the decades, here’s a commemorative 40th anniversary infographic from email delivery company Reachmail.
6/12/11 So you want to create a website? Brought to you by our favorite proverbial lone wolf librarian’s weblog. For a closer look check out the website, URL as always is below.
So what is EPUB?
According to the author,”EPUB is a standard format for ebooks. It’s used by Apple, Barnes and Noble, Kobo, Overdrive and many others not named Amazon.” This article will explain all about EPUB now and its future plans for e-books.
Epub really IS a container, by Eric Hellman
“It’s OK for libraries to put things in their EPUB books.” That’s what Bill Kasdorf, a member of the EPUB Working Group, told me last week at the IDPF Digital Book 2011Meeting. He checked with EPUB Revision Co-Editor Markus Gylling to make sure. I had been curious if libraries could put all their cataloging information inside an EPUB file instead of siloing it in their catalog system.
It may seem an odd question if you don’t know a few things about EPUB. EPUB is a standard format for ebooks. It’s used by Apple, Barnes and Noble, Kobo, Overdrive and many others not named Amazon. EPUB is near the end of a revision process that will result in EPUB 3.0.
The EPUB specs define a lot more than just a file format. Both EPUB 2 and EPUB 3 define a container format (in EPUB 3 it’s called the EPUB Open Container Format(OCF) 3.0, and then go on to define a number of file formats for files that go inside this container. These files are the resources- texts, graphics, etc. that make up the ebook.
OCF uses the ubiquitous ZIP format to wrap up all a book’s resource files into a neat, transportable package. That’s pretty much standard these days. Java “.jar” and “.war” files use the same mechanism, as do MacOS’ “.app” files. As a consequence, you can use any unzip utility to look inside an EPUB file and manipulate its contents.
There’s even a reserved name for a file to contain book level metadata in OCF:
META-INF/metadata.xml, as well as another file for rights information,
META-INF/rights.xml. Another file,
META-INF/signatures.xml can be used to prove who made parts of the file and determine whether anyone has mucked with them. When Gluejar issues Creative Commons editions of newly relicensed works, we’ll use the rights.xml file to make sure the CC declaration is explicit.
The new EPUB revision is coming fast. Last Monday, Bill McCoy, Executive Director of the International Digital Publishing Forum (IDPF) announced the release of the full EPUB 3 proposed specification. My guess is that when we look back on this event 10 years hence, we’ll recognize this as the moment EPUB began to revolutionize the world of information, and with it, the book industry.
Although Amazon still uses the aging MOBI format on its kindle devices, it seems only a matter of time before the infrastructure accumulating behind EPUB pushes them into the embrace of the IDPF. Already, most of the content flowing into the Amazon system is being produced in EPUB and converted to MOBI. Don’t expect this shift to happen soon though; in his IDPF presentation, Joshua Tallent of eBook Architectsdescribed rumors that this would happen soon as “bunk”- but it will happen sometime.
|(left) Autography Founder and Author T. J. Waters|
All this capability will remain latent unless people find compelling uses for it. I’m not worried. As the BookExpo itself got started, I met two different companies who were manipulating ebook files to solve the same problem: how can an author sign a book when the book is digital? Both companies, Autography andInScribed Media, create personalized experiences that leave artifacts of an author-consumer interaction inside ebook container files. Both of these companies have compelling solutions; they differ in their business models. Autography is structured as an author focused bookstore; InScribed is developing partnerships with existing bookstores.
|InScribed Media Founder and Author Alivia Tagliaferri|
To some extent, InScribed and Autography are forced to be a bit convoluted in the way they deliver their product because they need to live inside DRM green zones; users don’t have access to the files inside books without cracking the DRM (which is rather easy, by the way!). It’s unfortunate, because personalization of ebooks could be a good way to encourage responsible use. I certainly don’t want that picture of me torrenting around the world!
Libraries face a similar dilemma. The insides of an EPUB file could be greatly enriched by libraries, which have every motivation to enhance discovery both of the book and the information inside of it. But DRM gives the publisher and its delivery agents the exclusive ability to build context inside ebook containers. Libraries and readers are locked out. I think that for DRM systems to survive they will need to accommodate a more diverse set of user manipulations; author signatures are just the tip of the iceberg.
Coming soon, I’ll report on EPUB 3 metadata.
In a poignant article Karen Coyle sums up our world today crammed with information and ways to get at it. Read it and enjoy!
Comments on the digital age, which, as we all know, is 42
TUESDAY, MAY 31, 2011
All the ____ in the world
“Every ____ ever created”
“World’s largest ____ ”
“Repository of all knowledge in ____”
There’s something compelling about completeness, about the idea that you could gather ALL of something, anything, together into a single system or database or even, as in the ancient library of Alexandria, physical space. Perhaps it’s because we want the satisfaction of being finished. Perhaps it’s something primitive in our brain stems that has the evolutionary advantage of keeping us from declaring victory with a job half done. (Well, at least some of us.) To be sure, setting your goal to gather all of something means you don’t have to make awkward choices about what to gather/keep and what to discard. The indiscriminate everything may be the easier target.
Worldcat has 229,322,364 bibliographic records.
OpenLibrary has over 20 million records and 1.7 million fulltext books.
LibraryThing has records for 6,102,788 unique works.
If you read one book a week for 60 years, you will have read 3,120 books. If you read one book a day for that same length of time, you will have read 21,360 (not counting leap years).
The trick, obviously, is to discover the set of books, articles, etc., that will enhance your brief time on this planet. To do this, we search in these large databases. By having such large databases to search we are increasing our odds of finding everything in the world about our topic. Of course, we probably do not want everything in the world about our topic, we want the right books (articles, etc.) for us.
There are some down sides to this everything approach, not surprisingly. The first is that any search in a large database retrieves an unwieldy, if not unusable, large set of stuff. For this reason, many user interfaces give us ways to reduce the set using additional searches, often in the form of facets. Yet even then one is likely to be overwhelmed.
Everything includes key works and the odd bits and pieces of dubious repute and utility. Retrieving everything places a great burden on the user to sort out the wheat from the chaff. This is especially difficult when you are investigating an area where you are not an expert. Ranking may highlight the most popular items but those may not be what you are seeking. In fact, they may be items that you have retrieved before, even multiple times, because every search begins with a tabula rasa.
Another down side is that although computers are more powerful than ever and storage space is inexpensive, these large databases tend to collapse under the demands of just a few complex queries. Because of this, what users can and cannot do is controlled by the user interface which serves to protect the system by steering users to safe functions. Users often can create their own lists, can add tags, can make changes to the underlying data, but they cannot reorder the retrieved set by an arbitrary data element, they can’t compare their retrieved set against items they have already saved or seen previously, they can’t run analyses like topic maps on their retrieved set to better understand what is there.
I conclude, therefore, that what would be useful would be to treat these large databases as warehouses or raw materials, and provide software that allow users to select from these to create a personal database. This personal database software would resemble, ta da!,Vannevar Bush’s Memex, a combination database and information use system. I can see it having components that are analogous to some systems we already have:
- automated download of data from the big warehouses, likeLibraryThing
- an easy visual way to do interesting queries, like Yahoo! Pipes
- the ability to ask questions, like Wolfram Alpha
The personal database would be able to interact with the world of raw material and with other databases. I can imagine functions like: “get me all of the books and articles from this item’s bibliography.” Or: “compare my library to The Definitive Bibliography of [some topic].” Or: “Check my library and tell me if there are new editions to any of my books.” In other words, it’s not enough to search and get; in fact, searching and getting should be the least of what we are able to do.
There are a whole lot of resource management functions that a student or researcher could find useful because within a selected set there is still much to discover. These smaller, personal databases should also be able to interact with each other, doing comparisons and cross-database queries. We should be able to make notes and create relationships and share them (a Memex feature). The personal database should be associated with person, not a particular library or institution, and must work across institutions and services. I can’t imagine what it must be like today to graduate and to lose not only the privileged access that members of institutions enjoy but also the entire personal space that one has created while attached to that institution.
In short, it’s not about the STUFF, it’s about the services. It doesn’t matter how much STUFF you have it’s what people can DO with it. Verb, not noun. Quality not quantity.
So before we can discuss exit strategies it would be helpful to know what an API is. API stands for Application Programming Interface
According to Webopedia “API a set of routines, protocols, and tools for building software applications. A good API makes it easier to develop a program by providing all the building blocks. A programmer then puts the blocks together.”
Most operating environments, such as MS-Windows, provide an API so that programmers can write applications consistent with the operating environment. Although APIs are designed for programmers, they are ultimately good for users because they guarantee that all programs using a common API will have similar interfaces. This makes it easier for users to learn new programs.”
Using Free APIs, Exit Strategies
So these days there are lots of really well-done, useful, free APIs around, especially from Google. The obvious invitation is to use them in your apps.
The potential problem is that a free API, which you have no contract or service agreement for, can disappear at any time without notice. I mean, really any third party service can disappear at any time (company goes out of business, is hit by a tsunami, whatever), but completely free services are even more at risk. The company could decide to stop providing the service; or they could decide to start charging for it; or they could change the ToS such that you no longer qualify (or maybe you were always violating the ToS but you or they just noticed); they could give you plenty of advanced notice for any of this, or they could give you no advanced notice for any of this, you could just find it stops working one day.
That’s not a reason not to use a really useful free API, but it’s probably a reason to think about, when you start using it, what you would do if it went away; if you have a way to compensate, how long it would take you to do so, or if you can get away with simply abandoning the service (hypothetically formerly) provided by that free API altogether. (And part of this is considering who is going to be around to deal with this when/if it happens; if you’re not still in your job, have you left enough documentation or training for your replacements, or your clients, to understand what’s going on and what to do?)
Anyway, that’s what just happened with a bunch of useful Google APIs, which prompts us to think about the issue.
None of these Google APIs will be going away sooner than 6 months from no (some will last as long as 2 years from now). Some have replacements (although not neccesarily feature-identical replacements), some don’t. It could have been worse, but it’s still going to be bad for some people.
The deprecated API that’s getting the most attention is the Translate API; going away in 6 months, with no replacements. Apparently some people had built a business on software who’s core features relied on Google Translate. If they hadn’t previously considered what they’d do if Google Translate API went away (possibly including considering and deciding that they’d just abandon that software and move on to somethign else in that case and they were okay with that)… well, hopefully a lesson has been learned, not just by them but proactively by others who now know they better consider such things. The elimination of certain free Google API’s some people have been depending on gives us an opportunity to reflect a bit about how we plan our use of such services in general.
Google Books, and HathiTrustI don’t use Google Translate API in any software (and now clearly never will), but I do use Google Books API, which has also been deprecated.
Fortunately, a replacement is available, which looks like it will still support my use cases, as I did some initial analysis of previously.
Although with some inconveniences, and potentially some bigger problems. It’s inconvenient to have to re-write my code for the new API; but that’s just part of software development, although certainly especially so when using a free API with no contract or agreement. (I’ve begun referencing Ranganathan’s sixth law analogized for software: Software is a growing organism. Always. If you want your software to remain useful for a long time, you’re going to have to keep putting development into it, almost always.) And the requirement for an API key is inconvenient, and the new rate limits may be prohibitive for me, although it looks like maybe you can get a free increase to your rate limits, will have to look into it.
So it’s going to take some elbow grease, but I plan to keep using the new Google Books API for my use cases. My use case is basically, taking a known item citation, and identifying search-inside-the-book or full text reading/downloading access to it.
But it’s enormously comforting to me to realize that even if Google Books were to take away it’s API from me entirely, I’ve still got HathiTrust. Now, HathiTrust certainly isn’t a completely identical replacment — while HT has many of the volumes in Google Books at the same access levels, there are probably some in GBS that are not in HT. There may be some with increased access in GBS compared to HT. (But there are definitely some where the reverse is true). Google Book gives you search-inside-the-book with keywords-in-context in your hits even for in-copyright books they have no permission from the publisher to do anything special with; HT more cautiously does not. HT does not allow full PDF downloads of even public domain books if they were scanned by Google, because Google won’t let them. (to the general public; if you are a HathiTrust member, you get these downloads). Google offers epub downloads too, HT does not. (yet anyway).
But if Google Books API were to become unuseable by me tomorrow, I’d still be able to provide the fundamental services (links to full text and search inside for many volumes) by way of HathiTrust. In fact, my software is already consulting both HT and GBS for this, so I wouldn’t even have to make any code changes, even if it were to dissappear without notice, i’m still good.
This is really comforting, and I think shows the extreme importance of the HathiTrust effort, and the extreme foresight of umich for initiating it. These services are too important to the future of our libraries to rely soley on a free API with a third party with no contract. With HathiTrust, a cooperative owned and controlled by libraries has a say in it too. I bet if GBS did go away, HathiTrust would have more motivation (perhaps pushed by it’s members) to make it’s services even better to compensate (within the limitations of it’s contracts with Google).
Of course, for many of us, HathiTrust too is just another free API we have no service level agreement or contract with! But better to have two of em than rely on just one. And if your institution joins HathiTrust as a member, now it’s a different more reliable relationship.
(At least in theory; many library cooperative software/technology projects seem to end up really awful design-by-committee monstrosities, with very little reliability at all. HathiTrust seems to have avoided this so far, it’s really well-designed and implemented as a product; it’s probably not a coincidence that this is because itbegan as an initiative developed by one institution (with good software engineers), not as a design-by-committee distributed collaboration. Hopefully they can retain their quality even now that they are a more distributed membership organization).
Not Just APIs; Google ScholarOver the past few years, I’ve seen it suggested a few times that, well, Google Scholar is so great for finding scholarly articles, and our attempts to provide such services (that cross vendor boundaries, as Scholar does) tend to be so weak, why should we spend resources on trying to provide such a service at all? (With broadcast search like Metalib, or with aggregated indexes like Summon, etc.)
Google Scholar isn’t perfect, and there are some important things that it does less well than our local solutions, but without getting into details, I’ll agree that overall Google Scholar is a much much better solution than Metalib or even Summon, despite it’s flaws it works better and easier for our users. Let’s just agree to that for now, for the sake of argument. (The only thing I will point out specifically is that Scholar’s lack of an API means we can’t do some of the coolest stuff we could do with a product with an API, but anyway).
Okay, so let’s say we stop spending resources (license fees, staff time, etc) on Metalib or Summon or any local tool for cross-vendor search of scholarly articles, and just direct our users to Google Scholar.
What happens when/if Google Scholar goes away? It is indeed a free service, which we lack any contract or service agreement for. Google as a company probably isn’t going away any time soon, but how likely is it they might decide to eliminate Scholar, if it’s not making them money? It’s really hard to say. It seems not that likely at this point, but it’s certainly possible.
And if it were the only way we had of providing our users with a way to search scholarly articles cross-vendor, and it did go away, it would be disastrous for us academic libraries. Helping users find scholarly articles is a core part of our mission/business. (In fact, I think many academic libraries misallocate their resources putting more into library ‘catalog’ discovery than journal article discovery, when it probably should be the reverse as far as our users needs are concerned).
I think this is way too core a function to our users research (you know, what our job is to support) to put ourselves in a position where it could away with no notice, and we’d have to start from scratch developing or purchasing an alternate solution. Going back to searching individual platform websites individually isn’t good enough. Although some users even now will choose to do that. And certainly some users now will choose to use Google Scholar, and other free resources — and our job is integrating those external free resources as well as we can with our infrastructure (a difficult challenge, surely), rather than discouraging them from doing so. But our job also has to be maintaining a solution we do have control and reasonable expectations of longevity for. Yeah, we’ve got to do all of it, which takes resources, but that’s our job, this is a core research function.
This same analysis would apply to the similar argument: “Why do we need to provide a ‘library catalog’ at all, can’t people just use Google?” I’m often unsure if people saying this mean Google Books specifically, or Google Web Search, or what. I’m not sure they know themselves, I get the feeling they aren’t thinking through the details, just vaguely hand-waving “Google!”. There are a lot of tricky details in that plan, replacing ‘the catalog’ with ‘google’ isn’t exactly the same situation as replacing Metalib/Summon/etc with Google Scholar. But one of the details remains — a service you have no contract or agreement or expectation of stability for, what happens if it goes away or stops working, what happens to your ability to meet your mission and satisfy your users, if you don’t have an alternative that was already running or can be immediately put into place?
When speaking about the search engine Google there are some facts of which you may be unaware. A colleague, Jennifer Ballance sent this on a listserv and I thought it appropriate to add here for your edification:
Searching the Internet–Filter Bubbles
One of my colleagues pointed this video clip out to a few of us in my library recently and I was really struck by the information.http://www.ted.com/talks/eli_pariser_beware_online_filter_bubbles.htm.
Here is a copy of the transcript of the clip 6/2011:
Mark Zuckerberg, a journalist was asking him a question about the news feed. And the journalist was asking him, “Why is this so important?” And Zuckerberg said, “A squirrel dying in your front yard may be more relevant to your interests right now than people dying in Africa.” And I want to talk about what a Web based on that idea of relevance might look like.
So when I was growing up in a really rural area in Maine, the Internet meant something very different to me. It meant a connection to the world. It meant something that would connect us all together. And I was sure that it was going to be great for democracy and for our society. But there’s this shift in how information is flowing online, and it’s invisible. And if we don’t pay attention to it, it could be a real problem. So I first noticed this in a place I spend a lot of time — my Facebook page. I’m progressive, politically — big surprise — but I’ve always gone out of my way to meet conservatives. I like hearing what they’re thinking about; I like seeing what they link to; I like learning a thing or two. And so I was surprised when I noticed one day that the conservatives had disappeared from my Facebook feed. And what it turned out was going on was that Facebook was looking at which links I clicked on, and it was noticing that, actually, I was clicking more on my liberal friends’ links than on my conservative friends’ links. And without consulting me about it, it had edited them out. They disappeared.
So Facebook isn’t the only place that’s doing this kind of invisible, algorithmic editing of the Web. Google’s doing it too. If I search for something, and you search for something, even right now at the very same time, we may get very different search results. Even if you’re logged out, one engineer told me, there are 57 signals that Google looks at — everything from what kind of computer you’re on to what kind of browser you’re using to where you’re located — that it uses to personally tailor your query results. Think about it for a second: there is no standard Google anymore. And you know, the funny thing about this is that it’s hard to see. You can’t see how different your search results are from anyone else’s.
But a couple of weeks ago, I asked a bunch of friends to Google “Egypt” and to send me screen shots of what they got. So here’s my friend Scott’s screen shot. And here’s my friend Daniel’s screen shot. When you put them side-by-side, you don’t even have to read the links to see how different these two pages are. But when you do read the links, it’s really quite remarkable. Daniel didn’t get anything about the protests in Egypt at all in his first page of Google results. Scott’s results were full of them. And this was the big story of the day at that time. That’s how different these results are becoming.
So it’s not just Google and Facebook either. This is something that’s sweeping the Web. There are a whole host of companies that are doing this kind of personalization. Yahoo News, the biggest news site on the Internet, is now personalized — different people get different things. Huffington Post, the Washington Post, the New York Times — all flirting with personalization in various ways. And this moves us very quickly toward a world in which the Internet is showing us what it thinks we want to see, but not necessarily what we need to see. As Eric Schmidt said, “It will be very hard for people to watch or consume something that has not in some sense been tailored for them.”
So I do think this is a problem. And I think, if you take all of these filters together, you take all these algorithms, you get what I call a filter bubble. And your filter bubble is your own personal unique universe of information that you live in online. And what’s in your filter bubble depends on who you are, and it depends on what you do. But the thing is that you don’t decide what gets in. And more importantly, you don’t actually see what gets edited out. So one of the problems with the filter bubble was discovered by some researchers at Netflix. And they were looking at the Netflix queues, and they noticed something kind of funny that a lot of us probably have noticed, which is there are some movies that just sort of zip right up and out to our houses. They enter the queue, they just zip right out. So “Iron Man” zips right out, and “Waiting for Superman” can wait for a really long time.
What they discovered was that in our Netflix queues there’s this epic struggle going on between our future aspirational selves and our more impulsive present selves. You know we all want to be someone who has watched “Rashomon,” but right now we want to watch “Ace Ventura” for the fourth time. (Laughter) So the best editing gives us a bit of both. It gives us a little bit of Justin Bieber and a little bit of Afghanistan. It gives us some information vegetables, it gives us some information dessert. And the challenge with these kinds of algorithmic filters, these personalized filters, is that, because they’re mainly looking at what you click on first, it can throw off that balance. And instead of a balanced information diet, you can end up surrounded by information junk food.
What this suggests is actually that we may have the story about the Internet wrong. In a broadcast society — this is how the founding mythology goes — in a broadcast society, there were these gatekeepers, the editors, and they controlled the flows of information. And along came the Internet and it swept them out of the way, and it allowed all of us to connect together, and it was awesome. But that’s not actually what’s happening right now. What we’re seeing is more of a passing of the torch from human gatekeepers to algorithmic ones. And the thing is that the algorithms don’t yet have the kind of embedded ethics that the editors did. So if algorithms are going to curate the world for us, if they’re going to decide what we get to see and what we don’t get to see, then we need to make sure that they’re not just keyed to relevance. We need to make sure that they also show us things that are uncomfortable or challenging or important — this is what TED does — other points of view.
And the thing is we’ve actually been here before as a society. In 1915, it’s not like newspapers were sweating a lot about their civic responsibilities. Then people noticed that they were doing something really important. That, in fact, you couldn’t have a functioning democracy if citizens didn’t get a good flow of information. That the newspapers were critical, because they were acting as the filter, and then journalistic ethics developed. It wasn’t perfect, but it got us through the last century. And so now, we’re kind of back in 1915 on the Web. And we need the new gatekeepers to encode that kind of responsibility into the code that they’re writing.
I know that there are a lot of people here from Facebook and from Google — Larry and Sergey — people who have helped build the Web as it is, and I’m grateful for that. But we really need you to make sure that these algorithms have encoded in them a sense of the public life, a sense of civic responsibility. We need you to make sure that they’re transparent enough that we can see what the rules are that determine what gets through our filters. And we need you to give us some control, so that we can decide what gets through and what doesn’t. Because I think we really need the Internet to be that thing that we all dreamed of it being. We need it to connect us all together. We need it to introduce us to new ideas and new people and different perspectives. And it’s not going to do that if it leaves us all isolated in a Web of one.
As you can well imagine lots of comments followed this posting. As librarians, What do you think of this?
Tracking innovation, development and experimentation in information studies and library science and spotting new technologies, trends, fun stuff and much more.
FRIDAY, JUNE 03, 2011
EBSCO Publishing Acquires H.W. Wilson Company
By Michael Kelley
From: Library Journal via Centered Librarian
“EBSCO Publishing acquired the venerable H.W. Wilson company late Wednesday. Financial details of the transaction were not available. EBSCO Publishing executives met with the Wilson team at their headquarters in the Bronx, NY, and a series of meetings will be held between management teams today and Friday, and more over the coming weeks.
Wilson operates a similar business to EBSCO offering abstract/index records and fulltext databases via its proprietary platform, Wilson Web, but it is a much smaller company with about 200 employees and sales that are less than 10 percent of EBSCO’s.
Databases from Wilson will be integrated with EBSCOhost over the coming months, and, eventually, the WilsonWeb platform will be eliminated, the companies said in a press release. EBSCO will maintain WilsonWeb until all Wilson databases are available on EBSCOhost and customers have been transitioned to EBSCOhost. The company anticipates maintaining the platform until December 2011. Customers of databases on WilsonWeb will be given concurrent access to databases on both platforms as these become available.”
This is brought to us by: INFOdocket Information Industry News + New Web Sites and Tools From Gary Price and Shirl Kennedy
New: Chart Showing How EBSCO and H.W. Wilson Databases Will Be Combined to Create “Super Databases” & Other Notes
Posted on June 3, 2011 by Gary D. Price
EBSCO has shared with us a bit more info about the merger with H.W. Wilson that was announced yesterday.
- “Bringing together technology and other operations such as working closely with editorial teams to leverage the best of both worlds across Wilson and EBSCO databases, collaboration among product management teams to enable the development of new“super” databases, and more.”
- “EBSCO Publishing’s Ipswich, Massachusetts headquarters offers room for expansion and is the logical choice for streamlining physical operations.”
- Databases From H.W. Wilson Complement Existing EBSCO Databases
- EBSCO Can Leverage Wilson Indexing
- End Users Gain Additional Content, More Access to Full-Text
Here’s a Chart that Shows How Wilson and EBSCO Databases Will Be Combined to Create “Super” Databases
(Chart Courtesy of EBSCO)
Fast Facts About Databases:
- Concurrent Access Will Be Available As the Wilson Databases are Loaded on to EBSCOhost Until End of Each Library’s Subscription Period
- The Loading of the Wilson Databases Has Begun and EBSCO’s Goal is To Have Them All Loaded By the End of 2011
- The Super Databases are Expected to Debut in Spring 2012
- Over Time Wilson Web Platform Will Be Phased Out
- An Example of How Wilson Indexing Will Be Used With EBSCOhost
WilsonWeb keyword searches match against their controlled vocabularies and return results from “use for” terms. For example, a keyword search for “Burma” also returns results on “Myanmar” (the official name of the country), because “Myanmar” is a USE FOR term for “Burma”. This functionality is being added to EBSCOhost, not only for all Wilson databases, but also for all EBSCO-owned databases. The automatic searching of Use For’s and See Also’s will also be applied to EBSCO Discovery Service.
- “New Unique Content?” Additional material that can be licensed and made accessible through one of the Super Databases.
Ipl2 gives us the following pathfinder which involves a lot of cataloging resources, classification and metadata:
Organizing the Web: Resources for Librarians
posted May 5 ,2011
Organizing the Web – providing users with better ways of finding Internet resources that suit their needs – is a problem of current interest to all types of libraries and information professionals. The idea of “organizing the Web” has many facets. It may involve cataloging Web sites and other Internet resources, in either traditional or non-traditional ways. It might mean improving subject access to Internet resources by applying controlled vocabularly or number classification schemes. Or, it might mean embedding Web documents with metadata elements to facilitate search and retrieval, or markup language tags to provide richer information about digitized documents. This pathfinder is intended to provide starting points for locating resources on all these topics.
This guide is not meant to be exhaustive, but to provide a jumping-off point for learning about and applying methods of organizing the Web. I’ve included some suggestions for finding print resources, links to the home pages of some existing projects in the fields of Internet cataloging and classification, and some major practical guides to different cataloging methods that can be found online. I’ve also included a few links pages that can serve as guides to locating further Internet and print resources on the subjects of Internet cataloging, classification and metadata.
Print ResourcesUp-to-date print resources on Internet-related topics are hard to find since Internet issues often change and develop faster than books and articles can make it into print. Books on “libraries and the Internet” tend to give general overviews of metadata or Internet cataloging, without going into detail about specific cataloging standards and procedures.
Journal articles may provide a more timely discussion of specific issues, although the time lag between writing and publication of articles can still be a factor. The web sites listed in this pathfinder are good sources of keywords to use when searching index databases such as LISA or Library Literature. Small publications by library-related organizations, such as OCLC, are another place to look for up-to-date and detailed explanations of cataloging practice, although the most current versions of this information tend to appear online.
Here are a few starting points for seeking print resources about organizing the Web.
Library of Congress Subject Headings for searching in individual library catalogs or WorldCat:
Cataloging of computer network resources
Cataloging of computer files
Cataloging of electronic books
Computer network resources – [combined with some heading related to libraries]
Internet (Computer Network) – [combined with some heading related to libraries]
A Few Specific Print Resources:
- Anglo-American Cataloging Rules, Chapter 9
- Chapter 9 of the AACR is the basis for traditional-style cataloging of Internet resources. See OCLC’s guide to MARC cataloging of Internet resources for an interpretation of AACR specific to Internet cataloging.Taylor, Arlene. The organization of information. Englewood, CO: Libraries Unlimited, 1999.
- A general text on current issues in cataloging and metadata. Includes an introduction to the different types of metadata, with examples, and some discussion of the problems of cataloging Internet resources.Hudgins, Jean. Getting mileage out of metadata: Applications for the library. Chicago: American Library Association, 1999.
- A guide to different types of metadata and their uses, published by the American Library Association.Journal of Internet Cataloging
Quarterly journal published by Haworth Press since 1997.
Internet ResourcesA wide variety of resources on “organizing the web” are available online, from practical guides to applying a specific cataloging standard to the home pages of specific projects and initiatives. The list below provides links to several well-maintained web sites of each type. See the links pages at the end of this pathfinder for links to many more web sites and online articles and reports.
- Dublin Core Metadata Initiative
An initiative to develop a versatile set of metadata elements to assist in describing and locating Web resources.EAD (Encoded Archival Description)
A document metadata standard for encoding archival finding aids in online form.Government Information Locator Service
A government documents search site. Online documents are cataloged, searched for and retrieved using a set of attributes developed by GILS.Text Encoding Initiative
Homepage for TEI, a markup language that both sets the format and appearance of online documents and provides metadata for describing and locating information resources.Resource Description Framework (RDF)
“The Resource Description Framework (RDF) integrates a variety of web-based metadata activities including sitemaps, content ratings, stream channel definitions, search engine data collection (web crawling), digital library collections, and distributed authoring, using XML as an interchange syntax.” This project home page offers information on the various aspects of the RDF project.CyberStacks
A browseable catalog of Web sites organized using Library of Congress Classification. Currently limited to certain subject areas.BUBL LINK
A browseable and searchable Web catalog organized according to Dewey Decimal Classification.CyberDewey
A browseable catalog of Web sites organized using Dewey Decimal Classification.
- Cataloging Internet Resources: A Manual and Practical Guide
OCLC’s guide to creating MARC records for Internet resources, to be used in conjunction with the Anglo-American Cataloging Rules.Using Dublin Core
A guide to applying the Dublin Core metadata elements, a set of cataloging tags that can be applied directly to HTML documents and other Internet resources.Application Profile for the Government Information Locator Service (GILS)
A guide to online document attributes used by the Government Information Locator Service (GILS).Dublin Core/MARC/GILS Crosswalk
Library of Congress equivalency guide for converting Dublin Core metadata to MARC tagging or GILS (Government Information Locator Service) document attributes.Remote Access Computer File Serials
CONSER guide to cataloging electronic serials.TEI Guidelines for Electronic Text Encoding and Interchange
A browseable and searchable guide to the TEI (Text Encoding Initiative) metadata markup language.
Related Links Pages
- Cataloguing and Indexing of Electronic Resources
A links page maintained by the International Federation of Library Associations (IFLA). Includes links to related Web sites, online articles, IFLA conference documents, e-mail discussion groups and more.Beyond Bookmarks: Schemes for Organizing the Web
A large collection of links, including many links to subject-specific controlled vocabularies. Also includes some links to sites in Swedish.Candy Schwartz’s Metadata Resources
Links page maintained by a professor of library science at Simmons College. Includes sections of links devoted to various types of metadata.Traugott Koch Home Page
Page maintained by a digital library scientist Lund University Library Netlab. Click “projects” and “publications and presentations” to see further links of interest, including links to several subject-specific links pages maintained by the author.Google Web Directory: Metadata
Large collection of metadata-related links, arranged in a hierarchy of categories. Categories include Dublin Core, MARC, Encoded Archival Description, TEI and more.Yahoo! Cataloging: Electronic Resources
A small collection of links from Yahoo!
This pathfinder created by Carrie Preston
When Not to Google: Searches You’re Better Off Making Elsewhere
Kevin Purdy— For the searching you do every day, go ahead and use the powerful, convenient, ever-improving Google. But for certain queries, other search engines are significantly better. Let’s dig into the searches you’re better off making at engines other than Google.
Google’s good at a lot of things, but it also has to serve a lot of interests. Any relatively modern search engine knows that, in order to compete and differentiate, it has to do something different, something better, or something special, aside from general “
katy perry video” searches. Here are the best search engines for tackling specific types of search:
DuckDuckGo: Quick Site Searches, Programming, and Totally Anonymous Searching
That’s nice, but what does DuckDuckGo do? It “bangs.” Bang, as in the term programmers use to refer to exclamation marks. By putting an exclamation in front of a site or resource you want to search, you can quickly search on that site from DuckDuckGo, whether you know how that search works or not. Searching !lifehacker linux uses our own site’s search engine to look up Linux posts (though you can shorten it to
!a triggers a product search on Amazon.com, and
!yt a YouTube search. But you can loosely shoot from the hip and hit an astounding number of sites:
!retailmenot green mountain coffee, and so on. With DuckDuckGo installed as a quick search option in your browser, it’s much easier to search a site this way than to type out
site:economist.com libya and hunt through results.
There are lots of neat “bangs” to dig through, but take special note, programmers and general nerd practitioners: there are a lot of computer and code resources here.
!github—the list goes on. In fact, DDG even includes the other search engines we’ve referenced here in its bangs. If you really were looking for a new default search engine, we could see DuckDuckGo as a viable option—if only for the sincere convenience of, say, searching the Android Market with
!market angry birds.
Blekko: Cruft-Free Results and Very Specific Things
Even after make a pretty big change to filter “content farms,”, searching Google for anything that might be remotely popular, especially in the form of a how-to or question, continues to involve sorting through varying versions of on-demand writing. Some of it is decent, even helpful; much of it looks the same, though, and you often find yourself wishing for more authoritative voice.
Enter Blekko. On its own, Blekko narrows down your search terms and filters out a lot of the ad-filled results you might come across. Search on a “hot” topic, like travel, product reviews, or song lyrics, and Blekko automatically filters out sites that seem to exist mostly to capture traffic without providing too much new information. Search in the health field, and the results are narrowed down to a set of about 75 sites that Blekko’s editors trust.
So let’s say you’re an increasingly ridiculous home coffee enthusiast (ahem), and you want to make at home the latte foam “art” you’ll see in coffee shops. Lots of web sites are anticipating this search. The first three results from Google, from earlier this week, are shown above: the first result is a WikiHow article, the second a box of YouTube videos, and the third from RateMyRosetta.com, where baristas and other foam-art enthusiasts can, well, rate each others’ leafy designs.
Blekko’s results are at left here, and they’re oriented more toward independent sites, by way of eliminating many of the less subtle grabs for your clicks. By way of disclosure, a Lifehacker post shows up as the second result, but I picked the
how to make latte art search at random, from my brain.It’s helpful to be able to skip the search-savvy sites when you’re looking for deeper knowledge. It’s also helpful to be able to explain a bit more clearly what you’re looking for. Google has modifiers for “must have” (
kennedys +kennebunkport) and “not” (
kennedys -"dead kennedys"), but you have to guess at them ahead of time. Let’s imagine you just finished watching Blade Runner for the first time (really?), and you’re now keen on learning how far we’ve come in making robots that look and act like humans—androids. But any search on “android” these days is chock full of apps, reviews, and news about Google’s mobile phone OS. Blekko knows this, or at least has seen it happen, so as you type in “
android,” you’re given a batch of “slashes” you can add to your search to narrow it down. “
android /robotics” popped up during my Blekko test, and did a good job of (mostly) winnowing my search down to items related to human/robot hybrids.
Wolfram Alpha: Data, Statistics, Research, and “I Wonder”
There’s no simple way to explain what Wolfram Alpha does, other than to say it tries to make the entirety of human knowledge into solvable equations—simple, huh? It’s a big task, but Wolfram Alpha quietly does some pretty amazing things with the unique data sets it can rummage through. It’s best thought of as a place to ask questions, and wonder about numbers, percentages, and other left brain ideas.
If you “asked” Google about how likely the average United Airlines flight was on time, versus Southwest Airlines, the top result is likely to be a blog post that features “Southwest vs. United Airlines” in its title, but relates to television advertising and branding. Ask Wolfram Alpha, and the first result considers “United Airlines” and “Southwest Airlines” as they exist on the stock market—UAUA vs. LUV. Neat, but not exactly what we wanted. But just under the search, Wolfram asks if you’d like to see your “United Airlines” as an airline. Click it and see.
Now we’re talking. Wolfram Alpha, culling data from nearly a dozen aviation sources, puts together a handy chart showing the on-time performance of United versus Southwest—along with enough statistics and comparisons to basically write an Aviation Business 101 paper by itself. At the bottom of the box, you can click to see Wolfram’s sources, and download a PDF of the data.
You have to spend some time with Wolfram to get a sense of what it’s capable of. Pretty much every Lifehacker editor has come across something unique and helpful it can do and written about it. A short, but by no means comprehensive, list would include:
- Calculating specific calories burned for any activity
- Analyzing illness symptoms and medication information
- Step-by-step explanations of mathematics.
Can you get to most of this data through good old Google? Eventually, sure. But when you’re looking for a specific piece of data, Wolfram can often provide it, and the context necessary to utilize it, in quicker fashion than you can comb through Google to eventually arrive at a PDF document.
What About Bing?
When most people with even a cursory knowledge of modern tech hear the phrase “search alternatives” or “non-Google search,” they might think Bing. Bing is growing in market share, and has some very robust search offerings. But Bing covers the same wide scope as Google, with an invitation to search for anything, everything, and sometimes get “quick answers” back with data tidbits. It does some topics in unique ways, like its Visual Search, video thumbnails, and robust travel visualization. But they have a lot of competition in each of those areas. If you find Bing to be head and shoulders above Google and specialty search sites, we’ll gladly take the hint in the comments.
Make These Secondary Searches Easier to Get To
As stated up top, most people know Google, like Google, and will continue to use Google as their go-to search. To make the secondary but very powerful search sites demonstrated above easier to search, you should add them to your browser’s search options, and make them very quick to fire off.
If you’re using Chrome, this is very easy. Head to any of the sites, do a search or two on them, then right-click in your address bar and select “Edit Search Engines.” You should see Blekko, DuckDuckGo, and Wolfram Alpha included in your search options, listed in the left-most column. The middle column shows what you’d have to type into your Chrome address bar, and then hit space or Tab after, to search the site instantly; the default is the full site URL. Click on that middle section and give your alternative searches much shorter shortcuts: “
ddg” for DuckDuckGo, perhaps, and maybe just “
bk” for Blekko, as examples. With DuckDuckGo, in particular, the ability to use the “bangs” to quickly search Amazon, the New York Times, NewEgg, or wherever you’re looking from the address bar quickly becomes addictive.
In Firefox, you can add these sites to your right-hand search box, but it’s faster to activate them from the address bar. You do this by creating keyword bookmarks. One nice thing about doing this in Firefox is that the built-in bookmark syncing in Firefox 4 covers your custom keyword bookmarks, so you only need to set them up once, then use them on any Firefox installation.
Opera users, you simply need to right-click in the search field of your alternative search site, choose “Create search,” and assign a keyword, as described in Opera’s Help section. Safari fans, SafariKeywords looks like your best bet to get searching outside the standard box.