XLRD/Python: Lecture de fichier Excel en dict avec boucles for

Question

Je cherche à lire dans un classeur Excel avec 15 champs et environ 2 000 lignes et à convertir chaque ligne en dictionnaire en Python. Je veux ensuite ajouter chaque dictionnaire à une liste. Je voudrais que chaque champ de la rangée supérieure du classeur soit une clé dans chaque dictionnaire et que la valeur de cellule correspondante soit la valeur dans le dictionnaire. J'ai déjà regardé des exemples ici et ici , mais j'aimerais faire quelque chose d'un peu différent. Le deuxième exemple fonctionnera, mais j’ai le sentiment qu’il serait plus efficace de boucler la ligne supérieure pour renseigner les clés du dictionnaire, puis parcourir toutes les lignes pour obtenir les valeurs. Mon fichier Excel contient des données de forums de discussion et ressemble à ceci (avec évidemment plus de colonnes):

id thread_id forum_id post_time votes post_text 4 100 3 1377000566 1 'here is some text' 5 100 4 1289003444 0 'even more text here'

Donc, j'aimerais que les champs id, thread_id et ainsi de suite soient les clés du dictionnaire. J'aimerais que mes dictionnaires ressemblent à ceci:

{id: 4, thread_id: 100, forum_id: 3, post_time: 1377000566, votes: 1, post_text: 'here is some text'}

Au départ, j'avais un code comme celui-ci qui parcourait le fichier, mais ma portée est erronée pour certaines des boucles for-loops et je génère beaucoup trop de dictionnaires. Voici mon code initial:

import xlrd from xlrd import open_workbook, cellname book = open('forum.xlsx', 'r') sheet = book.sheet_by_index(3) dict_list = [] for row_index in range(sheet.nrows): for col_index in range(sheet.ncols): d = {} # My intuition for the below for-loop is to take each cell in the top row of the # Excel sheet and add it as a key to the dictionary, and then pass the value of # current index in the above loops as the value to the dictionary. This isn't # working. for i in sheet.row(0): d[str(i)] = sheet.cell(row_index, col_index).value dlist.append(d)

Toute aide serait grandement appréciée. Merci d'avance pour la lecture.

alecxe · Accepted Answer

L'idée est de commencer par lire l'en-tête dans la liste. Ensuite, parcourez les lignes de la feuille (à partir de la suivante après l'en-tête), créez un nouveau dictionnaire basé sur les clés d'en-tête et les valeurs de cellule appropriées et ajoutez-le à la liste des dictionnaires:

from xlrd import open_workbook book = open_workbook('forum.xlsx') sheet = book.sheet_by_index(3) # read header values into the list keys = [sheet.cell(0, col_index).value for col_index in xrange(sheet.ncols)] dict_list = [] for row_index in xrange(1, sheet.nrows): d = {keys[col_index]: sheet.cell(row_index, col_index).value for col_index in xrange(sheet.ncols)} dict_list.append(d) print dict_list

Pour une feuille contenant:

A B C D 1 2 3 4 5 6 7 8

il imprime:

[{'A': 1.0, 'C': 3.0, 'B': 2.0, 'D': 4.0}, {'A': 5.0, 'C': 7.0, 'B': 6.0, 'D': 8.0}]

UPD (élargir la compréhension du dictionnaire):

d = {} for col_index in xrange(sheet.ncols): d[keys[col_index]] = sheet.cell(row_index, col_index).value

yopiangi · Answer

Essayez celui-ci . Cette fonction ci-dessous renverra générateur contient dict de chaque ligne et colonne.

from xlrd import open_workbook for row in parse_xlsx(): print row # {id: 4, thread_id: 100, forum_id: 3, post_time: 1377000566, votes: 1, post_text: 'here is some text'} def parse_xlsx(): workbook = open_workbook('excelsheet.xlsx') sheets = workbook.sheet_names() active_sheet = workbook.sheet_by_name(sheets[0]) num_rows = active_sheet.nrows num_cols = active_sheet.ncols header = [active_sheet.cell_value(0, cell).lower() for cell in range(num_cols)] for row_idx in xrange(1, num_rows): row_cell = [active_sheet.cell_value(row_idx, col_idx) for col_idx in range(num_cols)] yield dict(Zip(header, row_cell))

Kernel · Answer

from xlrd import open_workbook dict_list = [] book = open_workbook('forum.xlsx') sheet = book.sheet_by_index(3) # read first row for keys keys = sheet.row_values(0) # read the rest rows for values values = [sheet.row_values(i) for i in range(1, sheet.nrows)] for value in values: dict_list.append(dict(Zip(keys, value))) print dict_list

user2672938 · Answer

Cette réponse m'a beaucoup aidé! Je cherchais un moyen de faire cela pendant environ deux heures. Ensuite, j'ai trouvé cette réponse élégante et courte. Merci!

J'avais besoin d'un moyen de convertir xls en json en utilisant des clés.

J'ai donc adapté le script ci-dessus avec une déclaration json print comme suit:

from xlrd import open_workbook import simplejson as json #http://stackoverflow.com/questions/23568409/xlrd-python-reading-Excel-file-into-dict-with-for-loops?lq=1 book = open_workbook('makelijk-bomen-herkennen-schors.xls') sheet = book.sheet_by_index(0) # read header values into the list keys = [sheet.cell(0, col_index).value for col_index in xrange(sheet.ncols)] print "keys are", keys dict_list = [] for row_index in xrange(1, sheet.nrows): d = {keys[col_index]: sheet.cell(row_index, col_index).value for col_index in xrange(sheet.ncols)} dict_list.append(d) #print dict_list j = json.dumps(dict_list) # Write to file with open('data.json', 'w') as f: f.write(j)

khelili miliana · Answer

Ce script vous permet de transformer une donnée Excel en liste de dictionnaire

import xlrd workbook = xlrd.open_workbook('forum.xls') workbook = xlrd.open_workbook('forum.xls', on_demand = True) worksheet = workbook.sheet_by_index(0) first_row = [] # The row where we stock the name of the column for col in range(worksheet.ncols): first_row.append( worksheet.cell_value(0,col) ) # tronsform the workbook to a list of dictionnary data =[] for row in range(1, worksheet.nrows): Elm = {} for col in range(worksheet.ncols): Elm[first_row[col]]=worksheet.cell_value(row,col) data.append(Elm) print data

user3203010 · Answer

Essayez de commencer par configurer vos clés en analysant uniquement la première ligne, toutes les colonnes, une autre fonction pour analyser les données, puis appelez-les dans l’ordre.

all_fields_list = [] header_dict = {} def parse_data_headers(sheet): global header_dict for c in range(sheet.ncols): key = sheet.cell(1, c) #here 1 is the row number where your header is header_dict[c] = key #store it somewhere, here I have chosen to store in a dict def parse_data(sheet): for r in range(2, sheet.nrows): row_dict = {} for c in range(sheet.ncols): value = sheet.cell(r,c) row_dict[c] = value all_fields_list.append(row_dict)