Comment obtenir xpath à partir d'une instance XmlNode

Question

Quelqu'un pourrait-il fournir du code qui obtiendrait le chemin xpath d'une instance System.Xml.XmlNode?

Merci!

Jon Skeet · Accepted Answer

D'accord, je n'ai pas pu m'empêcher d'y aller. Cela ne fonctionnera que pour les attributs et les éléments, mais bon ... à quoi pouvez-vous vous attendre en 15 minutes :) De même, il peut très bien y avoir une façon plus propre de le faire.

Il est superflu d'inclure l'index sur chaque élément (en particulier celui racine!) Mais c'est plus facile que d'essayer de déterminer s'il y a une ambiguïté autrement.

using System; using System.Text; using System.Xml; class Test { static void Main() { string xml = @" <root> <foo /> <foo> <bar attr='value'/> <bar other='va' /> </foo> <foo><bar /></foo> </root>"; XmlDocument doc = new XmlDocument(); doc.LoadXml(xml); XmlNode node = doc.SelectSingleNode("//@attr"); Console.WriteLine(FindXPath(node)); Console.WriteLine(doc.SelectSingleNode(FindXPath(node)) == node); } static string FindXPath(XmlNode node) { StringBuilder builder = new StringBuilder(); while (node != null) { switch (node.NodeType) { case XmlNodeType.Attribute: builder.Insert(0, "/@" + node.Name); node = ((XmlAttribute) node).OwnerElement; break; case XmlNodeType.Element: int index = FindElementIndex((XmlElement) node); builder.Insert(0, "/" + node.Name + "[" + index + "]"); node = node.ParentNode; break; case XmlNodeType.Document: return builder.ToString(); default: throw new ArgumentException("Only elements and attributes are supported"); } } throw new ArgumentException("Node was not in a document"); } static int FindElementIndex(XmlElement element) { XmlNode parentNode = element.ParentNode; if (parentNode is XmlDocument) { return 1; } XmlElement parent = (XmlElement) parentNode; int index = 1; foreach (XmlNode candidate in parent.ChildNodes) { if (candidate is XmlElement && candidate.Name == element.Name) { if (candidate == element) { return index; } index++; } } throw new ArgumentException("Couldn't find element within parent"); } }

Robert Rossney · Answer

Jon a raison de dire qu'il existe un certain nombre d'expressions XPath qui produiront le même nœud dans un document d'instance. La façon la plus simple de créer une expression qui donne sans ambiguïté un nœud spécifique est une chaîne de tests de nœuds qui utilisent la position du nœud dans le prédicat, par exemple:

/node()[0]/node()[2]/node()[6]/node()[1]/node()[2]

Évidemment, cette expression n'utilise pas de noms d'éléments, mais si tout ce que vous essayez de faire est de localiser un nœud dans un document, vous n'avez pas besoin de son nom. Il ne peut pas non plus être utilisé pour rechercher des attributs (car les attributs ne sont pas des nœuds et n'ont pas de position; vous ne pouvez les trouver que par nom), mais il trouvera tous les autres types de nœuds.

Pour construire cette expression, vous devez écrire une méthode qui renvoie la position d'un nœud dans les nœuds enfants de son parent, car XmlNode n'expose pas cela en tant que propriété:

static int GetNodePosition(XmlNode child) { for (int i=0; i<child.ParentNode.ChildNodes.Count; i++) { if (child.ParentNode.ChildNodes[i] == child) { // tricksy XPath, not starting its positions at 0 like a normal language return i + 1; } } throw new InvalidOperationException("Child node somehow not found in its parent's ChildNodes property."); }

(Il y a probablement une façon plus élégante de le faire en utilisant LINQ, puisque XmlNodeList implémente IEnumerable, mais je vais avec ce que je sais ici.)

Ensuite, vous pouvez écrire une méthode récursive comme celle-ci:

static string GetXPathToNode(XmlNode node) { if (node.NodeType == XmlNodeType.Attribute) { // attributes have an OwnerElement, not a ParentNode; also they have // to be matched by name, not found by position return String.Format( "{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name ); } if (node.ParentNode == null) { // the only node with no parent is the root node, which has no path return ""; } // the path to a node is the path to its parent, plus "/node()[n]", where // n is its position among its siblings. return String.Format( "{0}/node()[{1}]", GetXPathToNode(node.ParentNode), GetNodePosition(node) ); }

Comme vous pouvez le voir, j'ai piraté d'une manière pour qu'il trouve également des attributs.

Jon s'est glissé avec sa version pendant que j'écrivais la mienne. Il y a quelque chose dans son code qui va me faire un peu pleurnicher maintenant, et je m'excuse à l'avance si ça sonne comme si je me moquais de Jon. (Je ne le suis pas. Je suis presque sûr que la liste des choses que Jon doit apprendre de moi est extrêmement courte.) Mais je pense que le point que je vais faire est assez important pour quiconque travaille avec XML pour Penser à.

Je soupçonne que la solution de Jon a émergé de quelque chose que je vois beaucoup de développeurs faire: penser aux documents XML comme des arbres d'éléments et d'attributs. Je pense que cela vient en grande partie de développeurs dont l'utilisation principale de XML est comme format de sérialisation, car tout le XML qu'ils utilisent est structuré de cette façon. Vous pouvez repérer ces développeurs car ils utilisent les termes "nœud" et "élément" de manière interchangeable. Cela les amène à proposer des solutions qui traitent tous les autres types de nœuds comme des cas spéciaux. (J'étais moi-même un de ces gars depuis très longtemps.)

Cela ressemble à une hypothèse simplificatrice pendant que vous le faites. Mais ce n'est pas. Cela rend les problèmes plus difficiles et le code plus complexe. Il vous conduit à contourner les éléments de la technologie XML (comme la fonction node() dans XPath) qui sont spécifiquement conçus pour traiter tous les types de nœuds de manière générique.

Il y a un drapeau rouge dans le code de Jon qui me ferait l'interroger dans une revue de code même si je ne connaissais pas les exigences, et c'est GetElementsByTagName. Chaque fois que je vois cette méthode utilisée, la question qui me vient à l'esprit est toujours "pourquoi doit-elle être un élément? Et la réponse est très souvent "oh, ce code doit-il aussi gérer les nœuds de texte?"

Roemer · Answer

Je sais, ancien poste, mais la version que j'aimais le plus (celle avec les noms) était défectueuse: lorsqu'un nœud parent a des nœuds avec des noms différents, il a cessé de compter l'index après avoir trouvé le premier nom de nœud non correspondant.

Voici ma version fixe de celui-ci:

/// <summary> /// Gets the X-Path to a given Node /// </summary> /// <param name="node">The Node to get the X-Path from</param> /// <returns>The X-Path of the Node</returns> public string GetXPathToNode(XmlNode node) { if (node.NodeType == XmlNodeType.Attribute) { // attributes have an OwnerElement, not a ParentNode; also they have // to be matched by name, not found by position return String.Format("{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name); } if (node.ParentNode == null) { // the only node with no parent is the root node, which has no path return ""; } // Get the Index int indexInParent = 1; XmlNode siblingNode = node.PreviousSibling; // Loop thru all Siblings while (siblingNode != null) { // Increase the Index if the Sibling has the same Name if (siblingNode.Name == node.Name) { indexInParent++; } siblingNode = siblingNode.PreviousSibling; } // the path to a node is the path to its parent, plus "/node()[n]", where n is its position among its siblings. return String.Format("{0}/{1}[{2}]", GetXPathToNode(node.ParentNode), node.Name, indexInParent); }

rugg · Answer

Voici une méthode simple que j'ai utilisée, qui a fonctionné pour moi.

 static string GetXpath(XmlNode node) { if (node.Name == "#document") return String.Empty; return GetXpath(node.SelectSingleNode("..")) + "/" + (node.NodeType == XmlNodeType.Attribute ? "@":String.Empty) + node.Name; }

James Randle · Answer

Ma valeur de 10p est un hybride des réponses de Robert et Corey. Je ne peux prétendre à un crédit que pour la dactylographie réelle des lignes de code supplémentaires.

 private static string GetXPathToNode(XmlNode node) { if (node.NodeType == XmlNodeType.Attribute) { // attributes have an OwnerElement, not a ParentNode; also they have // to be matched by name, not found by position return String.Format( "{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name ); } if (node.ParentNode == null) { // the only node with no parent is the root node, which has no path return ""; } //get the index int iIndex = 1; XmlNode xnIndex = node; while (xnIndex.PreviousSibling != null) { iIndex++; xnIndex = xnIndex.PreviousSibling; } // the path to a node is the path to its parent, plus "/node()[n]", where // n is its position among its siblings. return String.Format( "{0}/node()[{1}]", GetXPathToNode(node.ParentNode), iIndex ); }

Ren&#233; Endress · Answer

Si vous faites cela, vous obtiendrez un chemin avec les noms des nœuds ET la position, si vous avez des nœuds avec le même nom comme ceci: "/ Service [1]/Système [1]/Groupe [1]/Dossier [2 ]/Fichier [2] "

public string GetXPathToNode(XmlNode node) { if (node.NodeType == XmlNodeType.Attribute) { // attributes have an OwnerElement, not a ParentNode; also they have // to be matched by name, not found by position return String.Format("{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name); } if (node.ParentNode == null) { // the only node with no parent is the root node, which has no path return ""; } //get the index int iIndex = 1; XmlNode xnIndex = node; while (xnIndex.PreviousSibling != null && xnIndex.PreviousSibling.Name == xnIndex.Name) { iIndex++; xnIndex = xnIndex.PreviousSibling; } // the path to a node is the path to its parent, plus "/node()[n]", where // n is its position among its siblings. return String.Format("{0}/{1}[{2}]", GetXPathToNode(node.ParentNode), node.Name, iIndex); }

Jon Skeet · Answer

Il n'y a rien de tel que "le" xpath d'un nœud. Pour tout noeud donné, il peut y avoir de nombreuses expressions xpath qui lui correspondent.

Vous pouvez probablement remonter l'arborescence pour construire une expression an qui lui correspondra, en tenant compte de l'index d'éléments particuliers, etc., mais ce ne sera pas un code terriblement agréable.

Pourquoi en avez-vous besoin? Il peut y avoir une meilleure solution.

Sandy · Answer

J'ai produit VBA pour Excel pour le faire pour un projet de travail. Il génère des tuples d'un Xpath et le texte associé à partir d'un élément ou d'un attribut. Le but était de permettre aux analystes commerciaux d'identifier et de mapper du XML. Appréciez qu'il s'agit d'un forum C #, mais j'ai pensé que cela pourrait vous intéresser.

Sub Parse2(oSh As Long, inode As IXMLDOMNode, Optional iXstring As String = "", Optional indexes) Dim chnode As IXMLDOMNode Dim attr As IXMLDOMAttribute Dim oXString As String Dim chld As Long Dim idx As Variant Dim addindex As Boolean chld = 0 idx = 0 addindex = False 'determine the node type: Select Case inode.NodeType Case NODE_ELEMENT If inode.ParentNode.NodeType = NODE_DOCUMENT Then 'This gets the root node name but ignores all the namespace attributes oXString = iXstring & "//" & fp(inode.nodename) Else 'Need to deal with indexing. Where an element has siblings with the same nodeName,it needs to be indexed using [index], e.g swapstreams or schedules For Each chnode In inode.ParentNode.ChildNodes If chnode.NodeType = NODE_ELEMENT And chnode.nodename = inode.nodename Then chld = chld + 1 Next chnode If chld > 1 Then '//inode has siblings of the same nodeName, so needs to be indexed 'Lookup the index from the indexes array idx = getIndex(inode.nodename, indexes) addindex = True Else End If 'build the XString oXString = iXstring & "/" & fp(inode.nodename) If addindex Then oXString = oXString & "[" & idx & "]" 'If type is element then check for attributes For Each attr In inode.Attributes 'If the element has attributes then extract the data pair XString + Element.Name, @Attribute.Name=Attribute.Value Call oSheet(oSh, oXString & "/@" & attr.Name, attr.Value) Next attr End If Case NODE_TEXT 'build the XString oXString = iXstring Call oSheet(oSh, oXString, inode.NodeValue) Case NODE_ATTRIBUTE 'Do nothing Case NODE_CDATA_SECTION 'Do nothing Case NODE_COMMENT 'Do nothing Case NODE_DOCUMENT 'Do nothing Case NODE_DOCUMENT_FRAGMENT 'Do nothing Case NODE_DOCUMENT_TYPE 'Do nothing Case NODE_ENTITY 'Do nothing Case NODE_ENTITY_REFERENCE 'Do nothing Case NODE_INVALID 'do nothing Case NODE_NOTATION 'do nothing Case NODE_PROCESSING_INSTRUCTION 'do nothing End Select 'Now call Parser2 on each of inode's children. If inode.HasChildNodes Then For Each chnode In inode.ChildNodes Call Parse2(oSh, chnode, oXString, indexes) Next chnode Set chnode = Nothing Else End If End Sub

Gère le comptage des éléments en utilisant:

Function getIndex(tag As Variant, indexes) As Variant 'Function to get the latest index for an xml tag from the indexes array 'indexes array is passed from one parser function to the next up and down the tree Dim i As Integer Dim n As Integer If IsArrayEmpty(indexes) Then ReDim indexes(1, 0) indexes(0, 0) = "Tag" indexes(1, 0) = "Index" Else End If For i = 0 To UBound(indexes, 2) If indexes(0, i) = tag Then 'tag found, increment and return the index then exit 'also destroy all recorded tag names BELOW that level indexes(1, i) = indexes(1, i) + 1 getIndex = indexes(1, i) ReDim Preserve indexes(1, i) 'should keep all tags up to i but remove all below it Exit Function Else End If Next i 'tag not found so add the tag with index 1 at the end of the array n = UBound(indexes, 2) ReDim Preserve indexes(1, n + 1) indexes(0, n + 1) = tag indexes(1, n + 1) = 1 getIndex = 1 End Function

cjbarth · Answer

J'ai constaté qu'aucun des éléments ci-dessus ne fonctionnait avec XDocument, j'ai donc écrit mon propre code pour prendre en charge XDocument et utilisé la récursivité. Je pense que ce code gère plusieurs nœuds identiques mieux que certains des autres codes ici parce qu'il essaie d'abord d'aller aussi loin que possible dans le chemin XML, puis sauvegarde pour créer uniquement ce qui est nécessaire. Donc, si vous avez /home/white/bob et /home/white/mike et vous voulez créer /home/white/bob/garage le code saura le créer. Cependant, je ne voulais pas jouer avec les prédicats ou les caractères génériques, donc je les ai explicitement refusés; mais il serait facile d'ajouter un support pour eux.

Private Sub NodeItterate(XDoc As XElement, XPath As String) 'get the deepest path Dim nodes As IEnumerable(Of XElement) nodes = XDoc.XPathSelectElements(XPath) 'if it doesn't exist, try the next shallow path If nodes.Count = 0 Then NodeItterate(XDoc, XPath.Substring(0, XPath.LastIndexOf("/"))) 'by this time all the required parent elements will have been constructed Dim ParentPath As String = XPath.Substring(0, XPath.LastIndexOf("/")) Dim ParentNode As XElement = XDoc.XPathSelectElement(ParentPath) Dim NewElementName As String = XPath.Substring(XPath.LastIndexOf("/") + 1, XPath.Length - XPath.LastIndexOf("/") - 1) ParentNode.Add(New XElement(NewElementName)) End If 'if we find there are more than 1 elements at the deepest path we have access to, we can't proceed If nodes.Count > 1 Then Throw New ArgumentOutOfRangeException("There are too many paths that match your expression.") End If 'if there is just one element, we can proceed If nodes.Count = 1 Then 'just proceed End If End Sub Public Sub CreateXPath(ByVal XDoc As XElement, ByVal XPath As String) If XPath.Contains("//") Or XPath.Contains("*") Or XPath.Contains(".") Then Throw New ArgumentException("Can't create a path based on searches, wildcards, or relative paths.") End If If Regex.IsMatch(XPath, "()@='<>\|") Then Throw New ArgumentException("Can't create a path based on predicates.") End If 'we will process this recursively. NodeItterate(XDoc, XPath) End Sub

Plasmabubble · Answer

Qu'en est-il de l'utilisation de l'extension de classe? ;) Ma version (en s'appuyant sur d'autres travaux) utilise le nom de syntaxe [index] ... avec index omis, l'élément n'a pas de "frères". La boucle pour obtenir l'index de l'élément est à l'extérieur dans une routine indépendante (également une extension de classe).

Juste après ce qui suit dans n'importe quelle classe utilitaire (ou dans la classe Program principale)

static public int GetRank( this XmlNode node ) { // return 0 if unique, else return position 1...n in siblings with same name try { if( node is XmlElement ) { int rank = 1; bool alone = true, found = false; foreach( XmlNode n in node.ParentNode.ChildNodes ) if( n.Name == node.Name ) // sibling with same name { if( n.Equals(node) ) { if( ! alone ) return rank; // no need to continue found = true; } else { if( found ) return rank; // no need to continue alone = false; rank++; } } } } catch{} return 0; } static public string GetXPath( this XmlNode node ) { try { if( node is XmlAttribute ) return String.Format( "{0}/@{1}", (node as XmlAttribute).OwnerElement.GetXPath(), node.Name ); if( node is XmlText || node is XmlCDataSection ) return node.ParentNode.GetXPath(); if( node.ParentNode == null ) // the only node with no parent is the root node, which has no path return ""; int rank = node.GetRank(); if( rank == 0 ) return String.Format( "{0}/{1}", node.ParentNode.GetXPath(), node.Name ); else return String.Format( "{0}/{1}[{2}]", node.ParentNode.GetXPath(), node.Name, rank ); } catch{} return ""; }

Andrei · Answer

Une autre solution à votre problème pourrait être de "marquer" les nœuds xml que vous voudrez plus tard identifier avec un attribut personnalisé:

var id = _currentNode.OwnerDocument.CreateAttribute("some_id"); id.Value = Guid.NewGuid().ToString(); _currentNode.Attributes.Append(id);

que vous pouvez stocker dans un dictionnaire par exemple. Et vous pourrez ultérieurement identifier le nœud avec une requête xpath:

newOrOldDocument.SelectSingleNode(string.Format("//*[contains(@some_id,'{0}')]", id));

Je sais que ce n'est pas une réponse directe à votre question, mais cela peut aider si la raison pour laquelle vous souhaitez connaître le xpath d'un nœud est d'avoir un moyen de `` rejoindre '' le nœud plus tard après avoir perdu la référence à celui-ci dans le code .

Cela résout également les problèmes lorsque le document obtient des éléments ajoutés/déplacés, ce qui peut gâcher le xpath (ou les index, comme suggéré dans d'autres réponses).

Art · Answer

J'ai dû le faire récemment. Seuls les éléments devaient être pris en compte. Voici ce que j'ai trouvé:

 private string GetPath(XmlElement el) { List<string> pathList = new List<string>(); XmlNode node = el; while (node is XmlElement) { pathList.Add(node.Name); node = node.ParentNode; } pathList.Reverse(); string[] nodeNames = pathList.ToArray(); return String.Join("/", nodeNames); }

Corey Fournier · Answer

C'est encore plus simple

 ''' <summary> ''' Gets the full XPath of a single node. ''' </summary> ''' <param name="node"></param> ''' <returns></returns> ''' <remarks></remarks> Private Function GetXPath(ByVal node As Xml.XmlNode) As String Dim temp As String Dim sibling As Xml.XmlNode Dim previousSiblings As Integer = 1 'I dont want to know that it was a generic document If node.Name = "#document" Then Return "" 'Prime it sibling = node.PreviousSibling 'Perculate up getting the count of all of this node's sibling before it. While sibling IsNot Nothing 'Only count if the sibling has the same name as this node If sibling.Name = node.Name Then previousSiblings += 1 End If sibling = sibling.PreviousSibling End While 'Mark this node's index, if it has one ' Also mark the index to 1 or the default if it does have a sibling just no previous. temp = node.Name + IIf(previousSiblings > 0 OrElse node.NextSibling IsNot Nothing, "[" + previousSiblings.ToString() + "]", "").ToString() If node.ParentNode IsNot Nothing Then Return GetXPath(node.ParentNode) + "/" + temp End If Return temp End Function

Mabrouk MAHDHI · Answer

 public static string GetFullPath(this XmlNode node) { if (node.ParentNode == null) { return ""; } else { return $"{GetFullPath(node.ParentNode)}\{node.ParentNode.Name}"; } }