Trouver toutes les occurrences d'une sous-chaîne dans Python

Question

Python a string.find() et string.rfind() pour obtenir l'index d'une sous-chaîne sous forme de chaîne.

Je me demande s'il existe peut-être quelque chose comme string.find_all() qui peut renvoyer tous les index fondés (pas seulement le premier du début ou le premier de la fin)?

Par exemple:

string = "test test test test" print string.find('test') # 0 print string.rfind('test') # 15 #that's the goal print string.find_all('test') # [0,5,10,15]

marcog · Accepted Answer

Il n’existe pas de fonction de chaîne intégrée simple qui fasse ce que vous cherchez, mais vous pouvez utiliser les expressions les plus puissantes regular :

import re [m.start() for m in re.finditer('test', 'test test test test')] #[0, 5, 10, 15]

Si vous voulez trouver des correspondances qui se chevauchent, lookahead le fera:

[m.start() for m in re.finditer('(?=tt)', 'ttt')] #[0, 1]

Si vous voulez une recherche inversée complète sans chevauchement, vous pouvez combiner une anticipation négative et positive en une expression comme celle-ci:

search = 'tt' [m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')] #[1]

re.finditer retourne un generator , vous pouvez donc changer le [] ci-dessus en () pour obtenir un générateur au lieu d'une liste qui sera plus efficace si vous parcourez les résultats une fois.

Karl Knechtel · Answer

>>> help(str.find) Help on method_descriptor: find(...) S.find(sub [,start [,end]]) -> int

Ainsi, nous pouvons le construire nous-mêmes:

def find_all(a_str, sub): start = 0 while True: start = a_str.find(sub, start) if start == -1: return yield start start += len(sub) # use start += 1 to find overlapping matches list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]

Aucune chaîne temporaire ou regex requis.

thkala · Answer

Voici un moyen (très inefficace) d’obtenir les correspondances: all (c'est-à-dire même superposées):

>>> string = "test test test test" >>> [i for i in range(len(string)) if string.startswith('test', i)] [0, 5, 10, 15]

Chinmay Kanchi · Answer

Vous pouvez utiliser re.finditer() pour les correspondances ne se chevauchant pas.

>>> import re >>> aString = 'this is a string where the substring "is" is repeated several times' >>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))] [(2, 4), (5, 7), (38, 40), (42, 44)]

mais ne veut pas travaille pour:

In [1]: aString="ababa" In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))] Output: [(0, 3)]

AkiRoss · Answer

Encore une fois, vieux fil, mais voici ma solution en utilisant un generator and plain str.find.

def findall(p, s): '''Yields all the positions of the pattern p in the string s.''' i = s.find(p) while i != -1: yield i i = s.find(p, i+1)

Exemple

x = 'banananassantana' [(i, x[i:i+2]) for i in findall('na', x)]

résultats

[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]

Cody Piersall · Answer

Venez, laissez-nous recurse ensemble.

def locations_of_substring(string, substring): """Return a list of locations of a substring.""" substring_length = len(substring) def recurse(locations_found, start): location = string.find(substring, start) if location != -1: return recurse(locations_found + [location], location+substring_length) else: return locations_found return recurse([], 0) print(locations_of_substring('this is a test for finding this and this', 'this')) # prints [0, 27, 36]

Pas besoin d'expressions régulières de cette façon.

jstaab · Answer

Si vous cherchez juste un seul personnage, ceci fonctionnerait:

string = "dooobiedoobiedoobie" match = 'o' reduce(lambda count, char: count + 1 if char == match else count, string, 0) # produces 7

Également,

string = "test test test test" match = "test" len(string.split(match)) - 1 # produces 4

Mon intuition est que ni l'un ni l'autre (surtout le n ° 2) n'est terriblement performant.

Thurines · Answer

c'est un vieux fil mais je me suis intéressé et j'ai voulu partager ma solution.

def find_all(a_string, sub): result = [] k = 0 while k < len(a_string): k = a_string.find(sub, k) if k == -1: return result else: result.append(k) k += 1 #change to k += len(sub) to not search overlapping results return result

Il devrait renvoyer une liste des positions où la sous-chaîne a été trouvée ..___ Veuillez commenter si vous voyez une erreur ou une marge d'amélioration.

Andrew H · Answer

Ce fil est un peu vieux mais cela a fonctionné pour moi:

numberString = "onetwothreefourfivesixseveneightninefiveten" testString = "five" marker = 0 while marker < len(numberString): try: print(numberString.index("five",marker)) marker = numberString.index("five", marker) + 1 except ValueError: print("String not found") marker = len(numberString)

Bruno Vermeulen · Answer

Cela fait le tour pour moi en utilisant re.finditer

import re text = 'This is sample text to test if this Pythonic '\ 'program can serve as an indexing platform for '\ 'finding words in a paragraph. It can give '\ 'values as to where the Word is located with the '\ 'different examples as stated' # find all occurances of the Word 'as' in the above text find_the_Word = re.finditer('as', text) for match in find_the_Word: print('start {}, end {}, search string \'{}\''. format(match.start(), match.end(), match.group()))

Harsha Biyani · Answer

Tu peux essayer :

>>> string = "test test test test" >>> for index,value in enumerate(string): if string[index:index+(len("test"))] == "test": print index 0 5 10 15

naveen raja · Answer

Quelles que soient les solutions proposées par d'autres, elles reposent entièrement sur la méthode disponible, find () ou sur toute autre méthode disponible.

Quel est l'algorithme de base pour trouver toutes les occurrences d'un sous-chaîne dans une chaîne?

def find_all(string,substring): """ Function: Returning all the index of substring in a string Arguments: String and the search string Return:Returning a list """ length = len(substring) c=0 indexes = [] while c < len(string): if string[c:c+length] == substring: indexes.append(c) c=c+1 return indexes

Vous pouvez également hériter d'une classe str d'une nouvelle classe et utiliser cette fonction au dessous de.

class newstr(str): def find_all(string,substring): """ Function: Returning all the index of substring in a string Arguments: String and the search string Return:Returning a list """ length = len(substring) c=0 indexes = [] while c < len(string): if string[c:c+length] == substring: indexes.append(c) c=c+1 return indexes

Appeler la méthode

newstr.find_all ('Trouvez-vous cette réponse utile? alors upvote this!', 'this')

RaySaraiva · Answer

Vous pouvez facilement utiliser:

string.count('test')!

https://www.programiz.com/python-programming/methods/string/count

À votre santé!

Uri Goren · Answer

Lorsque vous recherchez une grande quantité de mots clés dans un document, utilisez flashtext

from flashtext import KeywordProcessor words = ['test', 'exam', 'quiz'] txt = 'this is a test' kwp = KeywordProcessor() kwp.add_keywords_from_list(words) result = kwp.extract_keywords(txt, span_info=True)

Flashtext s'exécute plus rapidement que regex sur une grande liste de mots de recherche.

BONTHA SREEVIDHYA · Answer

En découpant, nous trouvons toutes les combinaisons possibles, les ajoutons à une liste et trouvons le nombre de fois où cela se produit, à l'aide de la fonction count

s=input() n=len(s) l=[] f=input() print(s[0]) for i in range(0,n): for j in range(1,n+1): l.append(s[i:j]) if f in l: print(l.count(f))