xml.parsers.expat.ExpatError: pas bien formé (jeton non valide)

Question

Lorsque j'utilise xmltodict pour charger le fichier xml ci-dessous, j'obtiens une erreur: xml.parsers.expat.ExpatError: pas bien formé (jeton non valide): ligne 1, colonne 1

Voici mon dossier:

<?xml version="1.0" encoding="utf-8"?> <mydocument has="an attribute"> <and> <many>elements</many> <many>more elements</many> </and> <plus a="complex"> element as well </plus> </mydocument>

La source:

import xmltodict with open('fileTEST.xml') as fd: xmltodict.parse(fd.read())

Je suis sur Windows 10, en utilisant Python 3.6 et xmltodict 0.11.0

Si j'utilise ElementTree ça marche

tree = ET.ElementTree(file='fileTEST.xml') for elem in tree.iter(): print(elem.tag, elem.attrib) mydocument {'has': 'an attribute'} and {} many {} many {} plus {'a': 'complex'}

Remarque: j'ai peut-être rencontré un nouveau problème de ligne.
Remarque 2: J'ai utilisé Beyond Compare sur deux fichiers différents.
Il se bloque sur le fichier encodé en BOM UTF-8 et fonctionne avec le fichier UTF-8.
UTF-8 BOM est une séquence d'octets (EF BB BF) qui permet au lecteur d'identifier un fichier comme étant encodé en UTF-8.

jmunsch · Answer

Dans mon cas, le fichier était enregistré avec un Byte Order Mark comme c'est le cas par défaut avec notepad ++

J'ai réenregistré le fichier sans le BOM en plain utf8.

Renz Paul Del Rosario · Answer

Je pense que vous avez oublié de définir le type d'encodage. Je vous suggère d'essayer d'initialiser ce fichier xml en une variable de chaîne:

import xml.etree.ElementTree as ET import xmltodict import json tree = ET.parse('your_data.xml') xml_data = tree.getroot() #here you can change the encoding type to be able to set it to the one you need xmlstr = ET.tostring(xml_data, encoding='utf-8', method='xml') data_dict = dict(xmltodict.parse(xmlstr))

winklerrr · Answer

Python 3

Bon mot

data: dict = xmltodict.parse(ElementTree.tostring(ElementTree.parse(path).getroot()))

Aide pour `.json` Et `.xml`

J'ai écrit une petite fonction d'aide pour charger les fichiers .json Et .xml À partir d'un path donné. Je pensais que cela pourrait être utile pour certaines personnes ici:

import json import xml.etree.ElementTree def load_json(path: str) -> dict: if path.endswith(".json"): print(f"> Loading JSON from '{path}'") with open(path, mode="r") as open_file: content = open_file.read() return json.loads(content) Elif path.endswith(".xml"): print(f"> Loading XML as JSON from '{path}'") xml = ElementTree.tostring(ElementTree.parse(path).getroot()) return xmltodict.parse(xml, attr_prefix="@", cdata_key="#text", dict_constructor=dict) print(f"> Loading failed for '{path}'") return {}

Notes

si vous voulez vous débarrasser des marqueurs @ et #text dans la sortie json, utilisez les paramètres attr_prefix="" et cdata_key=""
normalement xmltodict.parse() renvoie un OrderedDict mais vous pouvez le changer avec le paramètre dict_constructor=dict

Utilisation

path = "my_data.xml" data = load_json(path) print(json.dumps(data, indent=2)) # OUTPUT # # > Loading XML as JSON from 'my_data.xml' # { # "mydocument": { # "@has": "an attribute", # "and": { # "many": [ # "elements", # "more elements" # ] # }, # "plus": { # "@a": "complex", # "#text": "element as well" # } # } # }

Sources

Prayson W. Daniel · Answer

Dans mon cas, le problème concernait les 3 premiers caractères. Les supprimer a donc fonctionné:

import xmltodict from xml.parsers.expat import ExpatError with open('your_data.xml') as f: data = f.read() try: doc = xmltodict.parse(data) except ExpatError: doc = xmltodict.parse(data[3:])

Arount · Answer

xmltodict semble ne pas être en mesure d'analyser <?xml version="1.0" encoding="utf-8"?>

Si vous supprimez cette ligne, cela fonctionne.

xml.parsers.expat.ExpatError: pas bien formé (jeton non valide)

Python 3

Bon mot

Aide pour .json Et .xml

Sources

Aide pour `.json` Et `.xml`