Friday, March 28, 2008

Setting and Getting Documet Quality in MarkLogic Server

I had a request to influence the score of documents returned by searches based on the year of publication. Since a year isn't used as part of most searches, it seemed like the best approach was to set the document quality to the pub year. Then I could use that value in the scoring calculations. Take a look at the Developer's Guide for how do do that.

Here's how I looped through all the documents in a collection and set document quality using a value stored in the XML already.

(: Set document quality :)
for $i in collection('myCollection')/book
return
let $myYear := $i/metadata/publication-date
let $myBaseUri := base-uri($i)
(: I don't know about you, but I don't trust my XML vendors.
This tests casting the data to an int first. :)
let $myDocumentQuality :=
if($myYear castable as xs:integer) then
$myYear cast as xs:integer
else 1990 (: This is a default setting in case of bad data. :)
return

xdmp:document-set-quality($myBaseUri, $myDocumentQuality)

Depending on how many documents you have stored, you may need to modify this to set the document quality in smaller batches because it is quite intensive.

Once that's run, you can go back and review all your document quality settings. I originally had these two queries run together, but I think it takes a minute or two (depending on your system) for the settings to actually be indexed.

(: Get document quality :)
<results>
{
for $i in collection('myCollection')/book
return
<result
base-uri="{base-uri($i)}"
year="{$i/metadata/publication-date}"
set-document-quality="{xdmp:document-get-quality(base-uri($i))}"/>
}
</results>

No comments: