Date: Fri, 29 Mar 2024 10:36:39 -0400 (EDT) Message-ID: <278093406.165.1711722999080@lyrasis1-roc-mp1> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_164_323619395.1711722999080" ------=_Part_164_323619395.1711722999080 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
It is unlikely that you will use this code =E2=80=98as is=E2=80=99. Your= data source is likely to be very different and possibly much more complex.= In fact this is the case for many of the data sources ingested here at Cor= nell. Consider, for example, a data source of Academic Articles where there= are typically several co-authors listed with each article and since many c= o-authors are not at Cornell, missing netids is a commonplace. However thes= e techniques easily extend to handle such cases. Until now no mention has b= een made about how to update information once in VIVO. Two possibilities su= ggest themselves:
Both have advantages and disadvantage= s depending on how complex the source is. Wholesale replacement is attracti= ve because all you have to do is retract what was once asserted; build new = RDF which is then asserted. If you keep (i.e. don=E2=80=99t retract) the ne= w people and organization RDF then chances are they will be available in&nb= sp;Per0.xml or Org0.xml when needed. On the other hand, wholesale replacement = can be very time consuming. To replace the many tens of thousands of Academ= ic Articles in VIVO at Cornell would take several hours just to retract. Ho= wever, for small sources wholesale replacement is a good choice.
Incremental update involves a differencing process where the source data= for one query result set is compared to the data for the next. Differences= in terms of additions, changes and deletes are tabulated to determine what= must be retracted (deletes and changes) and what must be asserted (changes= and additions). The granularity of the differencing process depends on the= complexity of the source. Deeply nested XML structures can be tough to com= pare. Look for =E2=80=98last modified dates=E2=80=99 and similar elements, = that signal a change, to simplify the problem.