So a few comments from a non-library expert, but who helped with the development of a FRBR RDF vocabulary.
First, I’d urge you to make everything you want people to read available as HTML. I shouldn’t have to download some file (in fact, many, many of them) and open it in another application just to understand what you’re trying to communicate.
But WRT to that, let me just focus on the RDF (since I know little about library cataloging!).
I’d definitely urge you to study Ian and Rich’s work (linked above) closely. They’re both RDF experts with a ton of experience (and Ian is now CTO of Talis), so there are a lot of technical details they get right and conventions they follow that are missing in your work.
Also, there’s a general principle in RDF that favors reuse and extension over reinvention. There shouldn’t be a need for anyone to invent another frbr:Work, frbr:expression, frbr:Person, etc. You can reuse them, including subclassing them (likewise with properties).
Moving on …
1) a class like EventAsSubject seems strange to me. Is it not enough to say there are works and there are events, and that the latter may be a subject of the former?
2) as a user of library data and catalogs, I really, really hate the convention of using natural language ids. I realize you might want to keep these for legacy reasons, but I certainly wouldn’t put it in the core of the model. Could you separate it out into a “legacy extension” space?
3) Treating simple properties like titles as full resources is modeling overkill in my view.
4) on contributor properties: my impulse would be to do simple subproperties of frbr:creator (say x:director), frbr:realizer (x:translator), frbr:producer, etc.
In the bibliographic ontology work I’ve been doing, however, we need to track more detail about contributions, so we have a Contribution class.
5) related to 4, a lot of your URI labels are needlessly complex and convoluted. If you have x:compiler as a subproperty of frbr:creator, then there’s no need for redundant information like x:creator-compiler; just do x:compiler.
6) I would reuse SKOS
7) your examples use a lot more blank nodes (resources without URIs) than I’d like, which partly goes back to some of my comments above.
I think that’s enough for now; it’s a big vocabulary!
As I said above, if you want to take this further, I’d strongly urge you to reuse the existing FRBR-in-RDF work where you can.
Hmm … I posted a long list of comments yesterday, but they don’t seem to have gone through.
In the meantime, I looked through the model notes, and saw this:
“I note that there has been at least one attempt to express FRBR in RDF and that they have not defined expression as a subclass of work. Instead, they have defined work as “disjoint with” expression, manifestation and item. Yet the FRBR primer (section 5.5), following OWL, says that when two classes are disjoint “no resource is an instance of both classes.” If an item is also a manifestation, expression and work, the FRBR-RDF people are wrong to do this, aren’t they?”
This is a strange reading of FRBR in my view. I’d say it’s absolutely correct to say that they are disjoint. A manifestation cannot be a work or an expression. It really makes no logical sense to argue they are. They are distinct conceptual levels. The book on my shelf is not the work that it manifests. Rather, they are related resources, organized conceptually into a tree. That’s what the modeling says, and it’s faithful to the FRBR report.
But practically speaking WRT to the RDF, if you want to search for all works and you say that all manifestations are also works, then you get everything returned. The distinctions become rather meaningless.
OTOH, I think it;s be reasonable to think in terms of a x:Sound being a subclass of, say, frbr:Expression, or maybe x:Music being a subclass of frbr:Work. That allows you to slice the data in useful ways.
The ontology does define some union classes, though, like frbr:Endeavor, for places where they need to define generic domains or ranges.
Thank you for your suggestions. Regarding the disjoint question: I would think it would be rather more likely that a user would search for a particular work, using its author and title, and expect to see all of its expressions and manifestations displayed so that the user could browse through them and make a selection. Also, the manifestation by itself would be rather meaningless and hard to identify for the user without the author and title of the work contained, the language of the expression contained, etc. In practice, every item is also a manifestation, an expression and a work. Whatever is true of the work is also true of the item, manifestation and expression. The fact that FRBR maps nearly all of the bibliographic description to manifestation and that the tables at the back of FRBR do not actually correspond to the entity definitions in the text makes me reluctant to simply reuse frbr:work.
On “I would think it would be rather more likely that a user would search for a particular work, using its author and title, and expect to see all of its expressions and manifestations displayed so that the user could browse through them and make a selection.”
Yes, but think of it this way: would you expect users to see a table where every work. expression and manifestation is a separate row, or would you expect that there’d be a hierarchy; that the expressions would nest beneath the works, and the manifestations beneath the expressions?
The latter would be achieved with some UI code that simply links the related resources.
“Also, the manifestation by itself would be rather meaningless and hard to identify for the user without the author and title of the work contained, the language of the expression contained, etc.”
Sure, but that’s an issue of data duplication, UIs, etc.; not model.
It seems logical enough to say that works and expressions and manifestations can all have titles, for example, and that they may all have the same title property values. But it does not follow that, as you say “every item is also a manifestation, an expression and a work.” Rather every item is related to (e.g. is a production of) a manifestation, every manifestation is a realization of an expression, etc.. They’re relations among different classes of resource.
I’ll leave here some of the comments I have made to Martha via email.
First, to give others the chance to play with this data, the RDF “tables” and examples need to be text files, not Word documents. As Word documents, they are not actionable, no one can work with them.
Next, there needs to be a valid schema, and the examples need to be valid xml. Again, that would allow others to experiment with this data as data. That is the main point of creating such data in RDF and XML.
If there ever were proof that we need some testing of FRBR concepts, the two different FRBR views of Yee and D’Arcus are that proof. The FRBR model itself seems to assume a hierarchy and some level of inheritance from the Work to the Expression and Manifestation. This is evidenced by the application of topical subjects only to the Work — presumably, this does not mean that manifestations have no subjects associated with them, but that they inherit them from the Work. Whether or not this is a good idea or whether this is a flaw in FRBR, it is inherent in the model. Many have found this to be problematic.
I agree with D’Arcus that Yee’s model unnecessarily bundles a number of concepts (like Primary and Creator). This is an area where we can work together to find a more efficient and flexible way to define the underlying concepts that will allow them to be combined as needed to create a bibliographic description.
We can eliminate many blank nodes by recognizing that there will be an underlying definition of the data elements upon which an actual data carrier will be created. An example is that of “publisher data” — those of us in libraries are in the habit of seeing date of publication, place of publication and publisher as a unit called “publisher data.” Publisher data, as a field (eg in MARC) is a blank node — it is just a container for those data points about the publisher. There needs to be a discussion about whether a data field that can receive publisher data will exist apart from the individual data elements that we now consider part of that data. Around this we should have the discussion relating to what Dublin Core calls “dumb down,” which I understand (imperfectly, most likely) as a specific relationship between superordinate and subordinate data elements, allowing some users to “dumb down” more specifically defined data elements to their superordinate form without creating ambiguity. As currently defined, the Publisher Data field would not be valid under that rule.
I suggest that Martha remove ALL of the fixed lists from her cataloging rules and define those as independent vocabularies. SKOS is probably the correct carrier for those, right? I also suggest that we create stubs for those lists in the NSDL vocabulary registry so that we can create valid data definitions for them (but we don’t need the detail of the actual values at this point). (Martha, I’ll try to find the time to help with this.)
wow – Bruce is right on pretty much all of this and making excellent points on the difference between the modelling and the UI. He’s doing such a good job I nearly didn’t join in ;-)
I don’t agree with keeping such things as titles as literals though. By making more or less everything a resource the relationships become very much richer. Take finding all the performances of Holst’s The Planets, for example. With cataloguing as it is the only common thing is that the title is consistent (once it’s been normalised to some extent).
Karen: “The FRBR model itself seems to assume a hierarchy and some level of inheritance from the Work to the Expression and Manifestation. This is evidenced by the application of topical subjects only to the Work — presumably, this does not mean that manifestations have no subjects associated with them, but that they inherit them from the Work.”
Yeah, I wouldn’t presume inheritance in this case. Does the report actually say that explicitly, or is that just how some in the library world read it?
I’d say that the manifestation embodies some textual expression of a work that has topics x, y, z. The topics are linked at the work level, and it shouldn’t be any more complicated than that.
Keep in mind that inheritance in RDF is typically used in order to enable inferencing. So if I say X is a subclass of Y, then if I add some triples that say some URI is of type X, then the system will add another triple that says it is of type Y.
But it gets more complicated. Let’s switch back the FRBR view and assume Martha’s modeling. If I define a property (let’s say x:translationOf) whose domain is x:Expression, and then I make a statement that some URI has that property. The inferencer will then logically conclude not only that this resource is in fact an x:Expression, but also that it is an x:Work. Clearly, that would be an incorrect inference, since a translation is typically understood to be a new expression.
Do you follow me here on the implications of subclassing in RDF and why I think the existing FRBR-in-RDF work has it right?
Bruce: “Does the report actually say that explicitly, or is that just how some in the library world read it?”
It doesn’t say “inheritance” but since the subjects are only associated with the work level that’s the only think you can conclude. Personally, I think this is an example of some fuzzy thinking that needs to be re-thought.
Bruce: “I’d say that the manifestation embodies some textual expression of a work that has topics x, y, z. The topics are linked at the work level, and it shouldn’t be any more complicated than that.”
Well, I can grok that, but we still need some kind of workable relationship between the manifestation and the work, and I do think that if you look at the attributes in FRBR you’ll find inconsistencies in what you can assume about the relationships between the different levels.
Bruce: “Do you follow me here on the implications of subclassing in RDF and why I think the existing FRBR-in-RDF work has it right?”
I think I get the point, but I need to go back to the FRBR-in-RDF and see how it was done there.
Intermediation tool requirements
Carrier encoding format
Materials applied to carrier
Carrier base materials
Process used to produce carrier
Configuration of playback channels
Carrier recording type
Carrier broadcast standard
Relationship among manifestations
Appendages to the expression
Nature of modification
Nature of the relationship
Mode of issuance
Original physical characteristics of work
original base material:
Content of work
Principle creator relator term
Karen: “Well, I can grok that, but we still need some kind of workable relationship between the manifestation and the work, and I do think that if you look at the attributes in FRBR you’ll find inconsistencies in what you can assume about the relationships between the different levels.”
Yeah, I can see that. I can also see that it’s a little unnerving to explode the traditional integrated record like this.
For the bibo ontology work, I had previously played around a lot with FRBR (and Fred had experience with it from work on the musical ontology), but decided instead to keep things simpler and just focus at the traditional document view (for the most part). But that doesn’t mean dismissing FRBR entirely. Indeed, we’ve defined bibo:Document as a subclass of frbr:Manifestation. This move, then, reflects a perspective that sees FRBR as allowing one to layer a more abstract view on top of legacy data; sort of like a bridge to FRBR.
Admittedly, if you were to integrate this metadata into a system oriented around FRBR it’d yield some data duplication, but I see that as a small price to pay. I personally don’t want to have to think about FRBR; would rather leave that to others ;-)
Hi, I’m Ian Davis and I’m co-author of the FRBR RDF schema mentioned in these comments. I haven’t yet read all of the cataloguing rules although some of my colleagues at Talis have.
I’d like to say that I agree with Bruce’s comment about Works, Expressions, Manifestations and Items being disjoint. Richard and I pored over the FRBR specification and we believe that making these classes disjoint best expresses the intent of the FRBR specification authors.
Subclassing has nothing to do with the way the data is presented to the end user – it’s a way of making logical separations of distinct data types. The structure that matters to the user is the relationships between the things being described. When a user’s search turns up a work then its relationships (the RDF properties) should be used to include related resources in the display. That’s the whole point of modelling this as RDF – expressing useful semantic relationships between things so people can discover things more easily.
Thank you all so much for being such good teachers! I finally get it about the subclassing (and I think your generosity in explaining these concepts will help many more people than I to start to understand data modelling generally and RDF specifically as we all try to move forward into the semantic web).
I have been working on all these good suggestions.
I’m having trouble understanding one suggestion from Bruce D’Arcus though: “Treating simple properties like titles as full resources is modeling overkill in my view.” Can someone explain this a bit to me? I still don’t get it. (Thanks!)
The examples are all displaying in RDF/XML now, but I still can’t seem to manage to debug the XML in the model well enough to get it to display, so it still appears as a Word document. I’ll keep working on this. (I’m getting a message about the rdfs lines not corresponding to the rdf document definition…)
I’ve just stumbled across this work … very interesting! But the thread went quite over a year ago. Is the model still being developed?
After DC2008 discussion with Karen and others, I started playing with an idea for an alternative RDF/OWL modeling style. It takes FRBR more loosely as a requirements document for distinctions that should be made, rather than a set of specific classes.
Thinking was that the FRBR distinctions can also be made using RDFS/OWL’s subclassing machinery, by subsetting the universe at different levels of specificity.
This is entirely an exploratory exercise, I’m not suggesting it as a mature alternative to any of the FRBR-as-classes designs. I just suspect that some data will naturally take this shape, and so documenting the design pattern could be worthwhile.
All that aside, I’m interested to hear of any updates to the models here… especially the RDF expressions…
I have gotten side-tracked by RDA development (serving on the RDA MARC Working Group) and a paper critiquing RDF as a vehicle for bibliographic data (based on my experience developing this model) that is just about to be published in Information Technology & Libraries, but I have a number of ideas for improving the model that I am just about to go back to; one is to add a class at the expression level called surrogate to deal with reproductions of unique art works and the like which are not enough like editions of works intended to be reproduced to fit into the expression class. Your work sounds very intriguing and I will definitely take a closer look at it and use it to inform my model. Thanks so much for writing.