I'm running neo4j queries from node.js using the neo4j-driver. A lot of things were simplified to cut irrelevant information, but what is needed is here.
I have been trying to make a query to ingest a data set with some quirks, defined as follows:
Curriculum: A list of Publications
Publication: Contains data about a publication and a field that is a list of Authors
Author: Relevant fields are externalId and normalizedFullName.
-
externalId is an id that comes from the data's origin system. It is not guaranteed to be present, but if it is, it will uniquely identify a node
-
normalizedFullName will always be present and it's ok to assume the same author will always have the same name wherever it appears; it is also acceptable that full name may not be unique and that at some point two different persons may be stored as the same node
-
It is possible for an author to be part of a publication with only it's normalizedFullName and be part of another with normalizedFullName AND externalId. As you can see, it is not very consistent data, but this is not a problem for the ends I need it.
It will look like this: (don't mind any syntax error)
"curriculum": [
{
"data": {
"fieldA": "a",
"fieldB": "b"
},
"authors": [
{
"externalId": "",
"normalizedFullName": "namea namea"
},
{
"externalId": "123456",
"normalizedFullName": "nameb nameb"
}
]
},
{
"data": {
"fieldA": "d",
"fieldB": "e"
},
"authors": [
{
"externalId": "123321",
"normalizedFullName": "namea namea"
},
{
"externalId": "123456",
"normalizedFullName": "nameb nameb"
}
]
}
]
Merging everything
Merging the publication part is trivial, but things get complicated when it comes to the authors since I have to follow this logic (simplified here) to merge an author:
IF author don't have externalId OR isn't already a node created with his externalId THEN
merge by normalizedFullName
ELSE IF there is already a node with this externalId THEN
merge by externalId
So, acknowledging that I would need some kind of conditional merge, finding that it could be achieved by "the foreach trick", I was able to come up with this little monster (comments added to clarify):
// For each publication, merge it
UNWIND {publications} as publication
MERGE (p:Publication { fieldA: publication.data.fieldA, fieldB: publication.data.fieldB })
ON CREATE SET p = publication.data
WITH p, publication.authors AS authors
// Then, for each author in this publication
UNWIND authors AS author
// IF author don't have externalId OR isn't already a node created with his externalId THEN
MATCH (a:Author) WHERE a.externalId = author.data.externalId AND a.externalId <> '' WITH count(a) as found, author, p
// Merge by name
FOREACH(ignoreMe IN CASE WHEN found = 0 THEN [1] ELSE [] END |
MERGE (aa:Author { normalizedFullName: author.data.normalizedFullName })
ON CREATE SET aa = author.data
MERGE (aa)-[:CONTRIBUTED]->(p)
)
// Else, merge by externalId
FOREACH(ignoreMe IN CASE WHEN found > 0 THEN [1] ELSE [] END |
MERGE (aa:Author { externalId: autor.dadta.externalId })
ON CREATE SET aa = author.data
MERGE (aa)-[:CONTRIBUTED]->(p)
)
Note: This is not the real query i'm using, just shows the exact structures.
The Problem
It doesn't work. It only creates the publications (corretly) and never the authors. It seems the MATCH, FOREACH or a combination of both is messing up with the loop I expected to happen because of UNWIND.
I'm at a point where I can't find a way to do it properly. I also can't find what is wrong, even checking the documentation available.
So, what do I do?
(let me know if anymore information is needed)
Thanks in advance for any insight!
via Eduardo Both
No comments:
Post a Comment