web-dev-qa-db-fra.com

Quel est le regex pour extraire tous les emojis d'une chaîne?

J'ai une chaîne encodée en UTF-8. Par exemple:

Thats a Nice joke ???????????? ????

Je dois extraire tous les émoticônes présents dans la phrase. Et les emoji pourraient être n'importe quel 

Lorsque cette phrase est visualisée dans le terminal à l'aide de la commande less text.txt, elle est visualisée comme suit:

Thats a Nice joke <U+1F606><U+1F606><U+1F606> <U+1F61B>

C'est le code UTF correspondant à l'emoji. Tous les codes pour les emojis se trouvent sur emojitracker .

Afin de trouver toutes les occurrences, j'ai utilisé un modèle d'expression régulière (<U\+\w+?>) mais cela n'a pas fonctionné pour la chaîne codée UTF-8.

Voici mon code:

    String s="Thats a Nice joke ???????????? ????";
    Pattern pattern = Pattern.compile("(<U\\+\\w+?>)");
    Matcher matcher = pattern.matcher(s);
    List<String> matchList = new ArrayList<String>();

    while (matcher.find()) {
        matchList.add(matcher.group());
    }

    for(int i=0;i<matchList.size();i++){
        System.out.println(matchList.get(i));

    }

Ceci pdf dit Range: 1F300–1F5FF for Miscellaneous Symbols and Pictographs. Je veux donc capturer tout personnage se trouvant dans cette plage.

33
vishalaksh

Le pdf que vous venez de mentionner dit Plage: 1F300–1F5FF pour symboles et pictogrammes divers. Alors disons que je veux capturer tout personnage se trouvant dans cette plage. Que faire maintenant?

D'accord, mais je vais juste noter que les emoji dans votre question sont en dehors de cette plage! :-)

Le fait qu'ils soient au-dessus de 0xFFFF complique les choses, car les chaînes Java stockent UTF-16. Nous ne pouvons donc pas utiliser une seule classe de caractères pour cela. Nous allons avoir paires de substitution. (Plus: http://www.unicode.org/faq/utf_bom.html )

U + 1F300 dans UTF-16 finit par être la paire \uD83C\uDF00; U + 1F5FF finit par être \uD83D\uDDFF. Notez que le premier caractère est monté, nous traversons au moins une limite. Nous devons donc savoir quelles gammes de paires de substitution nous recherchons.

N'étant pas imprégnée de connaissances sur le fonctionnement interne de UTF-16, j'ai écrit un programme pour le découvrir (source à la fin - je le vérifierais si j'étais vous plutôt que de me faire confiance). Cela me dit que nous recherchons \uD83C suivi de tout ce qui se trouve dans la plage \uDF00-\uDFFF (inclus), ou de \uD83D suivi de tout ce qui se trouve dans la plage \uDC00-\uDDFF (inclus).

Donc, armés de cette connaissance, nous pourrions en théorie écrire un modèle:

// This is wrong, keep reading
Pattern p = Pattern.compile("(?:\uD83C[\uDF00-\uDFFF])|(?:\uD83D[\uDC00-\uDDFF])");

Il s’agit d’une alternance de deux groupes sans capture, le premier groupe pour les paires commençant par \uD83C et le second groupe pour les paires commençant par \uD83D.

Mais qui échoue (ne trouve rien). Je suis presque sûr que c'est parce que nous essayons de spécifier la moitié d'une paire de substitution à différents endroits:

Pattern p = Pattern.compile("(?:\uD83C[\uDF00-\uDFFF])|(?:\uD83D[\uDC00-\uDDFF])");
// Half of a pair --------------^------^------^-----------^------^------^

Nous ne pouvons pas simplement séparer des paires de substitution de la sorte, on les appelle des paires de substitution pour une raison. :-)

Par conséquent, je ne pense pas que nous puissions utiliser des expressions régulières (ni même une approche basée sur des chaînes) pour cela. Je pense que nous devons rechercher dans des tableaux char.

Les tableaux char contiennent les valeurs UTF-16, donc nous pouvons trouver ces demi-paires dans les données si nous les cherchons à la dure:

String s = new StringBuilder()
                .append("Thats a Nice joke ")
                .appendCodePoint(0x1F606)
                .appendCodePoint(0x1F606)
                .appendCodePoint(0x1F606)
                .append(" ")
                .appendCodePoint(0x1F61B)
                .toString();
char[] chars = s.toCharArray();
int index;
char ch1;
char ch2;

index = 0;
while (index < chars.length - 1) { // -1 because we're looking for two-char-long things
    ch1 = chars[index];
    if ((int)ch1 == 0xD83C) {
        ch2 = chars[index+1];
        if ((int)ch2 >= 0xDF00 && (int)ch2 <= 0xDFFF) {
            System.out.println("Found emoji at index " + index);
            index += 2;
            continue;
        }
    }
    else if ((int)ch1 == 0xD83D) {
        ch2 = chars[index+1];
        if ((int)ch2 >= 0xDC00 && (int)ch2 <= 0xDDFF) {
            System.out.println("Found emoji at index " + index);
            index += 2;
            continue;
        }
    }
    ++index;
}

Évidemment, ce n'est que du code de niveau de débogage, mais cela fait le travail. (Dans votre chaîne donnée, avec ses emoji, bien sûr, il ne trouvera rien car ils sont en dehors de la plage. Mais si vous modifiez la limite supérieure de la seconde paire en 0xDEFF au lieu de 0xDDFF, elle le fera. inclurait aussi les non-emojis, cependant.)


Source de mon programme pour savoir quelles étaient les gammes de substitution:

public class FindRanges {

    public static void main(String[] args) {
        char last0 = '\0';
        char last1 = '\0';
        for (int x = 0x1F300; x <= 0x1F5FF; ++x) {
            char[] chars = new StringBuilder().appendCodePoint(x).toString().toCharArray();
            if (chars[0] != last0) {
                if (last0 != '\0') {
                    System.out.println("-\\u" + Integer.toHexString((int)last1).toUpperCase());
                }
                System.out.print("\\u" + Integer.toHexString((int)chars[0]).toUpperCase() + " \\u" + Integer.toHexString((int)chars[1]).toUpperCase());
                last0 = chars[0];
            }
            last1 = chars[1];
        }
        if (last0 != '\0') {
            System.out.println("-\\u" + Integer.toHexString((int)last1).toUpperCase());
        }
    }
}

Sortie:

\ uD83C\uDD00-\uDFFF 
\uD83D\uDDC00-\uDDFF
25
T.J. Crowder

En utilisant emoji-Java j'ai écrit une méthode simple qui supprime tous les emojis, y compris les modificateurs fitzpatrick . Nécessite une bibliothèque externe mais plus facile à maintenir que ces expressions rationnelles monstres. 

Utilisation:

String input = "A string ????with a \uD83D\uDC66\uD83C\uDFFFfew ????emojis!";
String result = EmojiParser.removeAllEmojis(input);

installation de emoji-Java Maven:

<dependency>
  <groupId>com.vdurmont</groupId>
  <artifactId>emoji-Java</artifactId>
  <version>3.1.3</version>
</dependency>

grade:

compile 'com.vdurmont:emoji-Java:3.1.3'

EDIT: la réponse précédemment soumise a été extraite du code source emoji-Java.

40
gidim

Avait un problème similaire. Ce qui suit m'a bien servi et correspond aux paires de substitution 

public class SplitByUnicode {
    public static void main(String[] argv) throws Exception {
        String string = "Thats a Nice joke ???????????? ????";
        System.out.println("Original String:"+string);
        String regexPattern = "[\uD83C-\uDBFF\uDC00-\uDFFF]+";
        byte[] utf8 = string.getBytes("UTF-8");

        String string1 = new String(utf8, "UTF-8");

        Pattern pattern = Pattern.compile(regexPattern);
        Matcher matcher = pattern.matcher(string1);
        List<String> matchList = new ArrayList<String>();

        while (matcher.find()) {
            matchList.add(matcher.group());
        }

        for(int i=0;i<matchList.size();i++){
            System.out.println(i+":"+matchList.get(i));

        }
    }
}

La sortie est:


Original String:Thats a Nice joke ???????????? ????
0:????????????
1:????

Trouvé l'expression régulière de https://stackoverflow.com/a/24071599/915972

16
Karan Ashar

Cela a fonctionné pour moi en Java 8:

public static String mysqlSafe(String input) {
  if (input == null) return null;
    StringBuilder sb = new StringBuilder();

    for (int i = 0; i < input.length(); i++) {
      if (i < (input.length() - 1)) { // Emojis are two characters long in Java, e.g. a rocket emoji is "\uD83D\uDE80";
        if (Character.isSurrogatePair(input.charAt(i), input.charAt(i + 1))) {
          i += 1; //also skip the second character of the emoji
          continue;
        }
      }
      sb.append(input.charAt(i));
    }

  return sb.toString();
}
10
Mike

tu peux le faire comme ça

    String s="Thats a Nice joke ???????????? ????";
    Pattern pattern = Pattern.compile("[\ud83c\udc00-\ud83c\udfff]|[\ud83d\udc00-\ud83d\udfff]|[\u2600-\u27ff]",
                                      Pattern.UNICODE_CASE | Pattern.CASE_INSENSITIVE);
    Matcher matcher = pattern.matcher(s);
    List<String> matchList = new ArrayList<String>();

    while (matcher.find()) {
        matchList.add(matcher.group());
    }

    for(int i=0;i<matchList.size();i++){
        System.out.println(matchList.get(i));
    }
6
Shi Xiangyang

En supposant que vous demandiez des plages emoji Unicode standard (il existe différents blocs par fournisseur), vous pouvez envisager ces trois plages:

  • 0x20a0 - 0x32ff
  • 0x1f000 - 0x1ffff
  • 0xfe4e5 - 0xfe4ee

Outre toutes les explications réfléchies que T.J.Crowder nous a communiquées, il convient de préciser qu’à partir de Java 7, il est possible de faire facilement correspondre des paires de substitution codées en UTF-16.

Jetez un coup d'œil à la documentation:

http://docs.Oracle.com/javase/7/docs/api/Java/util/regex/Pattern.html

Un caractère Unicode peut également être représenté dans une expression régulière en utilisant sa notation Hex (valeur de point de code hexadécimal) directement comme décrit dans la construction\x {...}. Par exemple, un caractère supplémentaire U + 2011F peut être spécifié sous la forme\x {2011F}, au lieu de deux séquences d'échappement Unicode consécutives de la paire de substitution\uD840\uDD1F.

Néanmoins, si vous ne pouvez pas passer à Java 7, vous pouvez étendre le précieux UnicodeEscaper fourni par Guava.

Voici une implémentation à titre d'exemple:

public class SimpleEscaper extends UnicodeEscaper
{
    @Override
    protected char[] escape(int codePoint)
    {
        if (0x1f000 >= codePoint && codePoint <= 0x1ffff)
        {
            return Integer.toHexString(codePoint).toCharArray();
        }

        return Character.toChars(codePoint);
    }
}
5
Mr.C

La meilleure expression rationnelle pour extraire TOUS les emoji est la suivante:

(?:[\u2700-\u27bf]|(?:\ud83c[\udde6-\uddff]){2}|[\ud800-\udbff][\udc00-\udfff]|[\u0023-\u0039]\ufe0f?\u20e3|\u3299|\u3297|\u303d|\u3030|\u24c2|\ud83c[\udd70-\udd71]|\ud83c[\udd7e-\udd7f]|\ud83c\udd8e|\ud83c[\udd91-\udd9a]|\ud83c[\udde6-\uddff]|[\ud83c[\ude01-\ude02]|\ud83c\ude1a|\ud83c\ude2f|[\ud83c[\ude32-\ude3a]|[\ud83c[\ude50-\ude51]|\u203c|\u2049|[\u25aa-\u25ab]|\u25b6|\u25c0|[\u25fb-\u25fe]|\u00a9|\u00ae|\u2122|\u2139|\ud83c\udc04|[\u2600-\u26FF]|\u2b05|\u2b06|\u2b07|\u2b1b|\u2b1c|\u2b50|\u2b55|\u231a|\u231b|\u2328|\u23cf|[\u23e9-\u23f3]|[\u23f8-\u23fa]|\ud83c\udccf|\u2934|\u2935|[\u2190-\u21ff])

Il identifie de nombreux emoji à caractère unique que les autres réponses ne prennent pas en compte. Pour plus d'informations sur le fonctionnement de cette expression rationnelle, consultez ce post. https://medium.com/@thekevinscott/emojis-in-javascript-f693d0eb79fb#.enomgcu63

Emoji regex

public static final String sEmojiRegex = "(?:[\\u2700-\\u27bf]|" +

        "(?:[\\ud83c\\udde6-\\ud83c\\uddff]){2}|" +
        "[\\ud800\\udc00-\\uDBFF\\uDFFF]|[\\u2600-\\u26FF])[\\ufe0e\\ufe0f]?(?:[\\u0300-\\u036f\\ufe20-\\ufe23\\u20d0-\\u20f0]|[\\ud83c\\udffb-\\ud83c\\udfff])?" +

        "(?:\\u200d(?:[^\\ud800-\\udfff]|" +

        "(?:[\\ud83c\\udde6-\\ud83c\\uddff]){2}|" +
        "[\\ud800\\udc00-\\uDBFF\\uDFFF]|[\\u2600-\\u26FF])[\\ufe0e\\ufe0f]?(?:[\\u0300-\\u036f\\ufe20-\\ufe23\\u20d0-\\u20f0]|[\\ud83c\\udffb-\\ud83c\\udfff])?)*|" +

        "[\\u0023-\\u0039]\\ufe0f?\\u20e3|\\u3299|\\u3297|\\u303d|\\u3030|\\u24c2|[\\ud83c\\udd70-\\ud83c\\udd71]|[\\ud83c\\udd7e-\\ud83c\\udd7f]|\\ud83c\\udd8e|[\\ud83c\\udd91-\\ud83c\\udd9a]|[\\ud83c\\udde6-\\ud83c\\uddff]|[\\ud83c\\ude01-\\ud83c\\ude02]|\\ud83c\\ude1a|\\ud83c\\ude2f|[\\ud83c\\ude32-\\ud83c\\ude3a]|[\\ud83c\\ude50-\\ud83c\\ude51]|\\u203c|\\u2049|[\\u25aa-\\u25ab]|\\u25b6|\\u25c0|[\\u25fb-\\u25fe]|\\u00a9|\\u00ae|\\u2122|\\u2139|\\ud83c\\udc04|[\\u2600-\\u26FF]|\\u2b05|\\u2b06|\\u2b07|\\u2b1b|\\u2b1c|\\u2b50|\\u2b55|\\u231a|\\u231b|\\u2328|\\u23cf|[\\u23e9-\\u23f3]|[\\u23f8-\\u23fa]|\\ud83c\\udccf|\\u2934|\\u2935|[\\u2190-\\u21ff]";

quelques emojis (1627)

// count = 1627
public static final String sEmojiTest = "????????????????????????????????☺️????????????????????????????????????????????????????????????????????????????????????????????????????????????????☹️????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????☠️????????????????????????????????????????????????????????????????????????????????????✊????????????✌️????????????????????????☝️✋????????????????????????????✍️????????????????????????????????????????????????????????????????????????????????????‍♀????????????????????‍♀????????‍♀????????‍♀????????‍♀????????️‍♀️????????‍⚕????‍⚕????‍????????‍????????‍????????‍????????‍????????‍????????‍????????‍????????‍????????‍????????‍????????‍????????‍????????‍????????‍????????‍????????‍????????‍????????‍????????‍????????‍????????‍????????‍????????‍????????‍✈????‍✈????‍????????‍????????‍⚖????‍⚖????????????????????????????????????‍♀????????????‍♂????????‍♂????????‍♂????????‍♂????‍♀????‍♂????‍♀????‍♂????????‍♂????????‍♂????????‍♂????????‍♂????????????????????‍♂????‍♀????????‍♀????????????????????????‍❤️‍????????‍❤️‍????????????‍❤️‍????‍????????‍❤️‍????‍????????????‍????‍????????‍????‍????‍????????‍????‍????‍????????‍????‍????‍????????‍????‍????????‍????‍????????‍????‍????‍????????‍????‍????‍????????‍????‍????‍????????‍????‍????????‍????‍????????‍????‍????‍????????‍????‍????‍????????‍????‍????‍????????‍????????‍????????‍????‍????????‍????‍????????‍????‍????????‍????????‍????????‍????‍????????‍????‍????????‍????‍????????????????????????????????????????????????????????????????????⛑????????????????????????????????☂️????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????☘️????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????⭐️????✨⚡️????????☄☀️????⛅️????????????☁️????⛈????????☃️⛄️❄️????????????????????????????☔️????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????☕️????????????????????????????????????????????????????⚽️????????⚾️????????????????????????????????????????⛳️????????????????⛸????⛷????????️‍♀️????????????‍♀????‍♂????‍♀????‍♂⛹️‍♀️⛹????‍♀????‍♂????️‍♀️????????‍♀????????‍♀????????‍♀????‍♂????‍♀????????????‍♀????????‍♀????????????????????????????????????????????????????????‍♀????‍♂????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????✈️????????????????????????⛵️????????????⛴????⚓️????⛽️????????????????????????⛲️????????????????????????????⛱????????⛰????????????????????⛺️????????????????????????????????????????????????????????????????????????????????⛪️????????????⛩????????????????????????????????????????????????????????????⌚️????????????⌨️????????????????????????????????????????????????????????????????????????☎️????????????????????????????⏱⏲⏰????⌛️⏳????????????????????????????????????????????????????????????????⚖️????????⚒????⛏????⚙️⛓????????????????⚔️????????⚰️⚱️????????????????⚗️????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????✉️????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????✂️????????✒️????????????✏️????????????????????????❤️????????????????????????❣️????????????????????????????????☮️✝️☪️????☸️✡️????????☯️☦️????⛎♈️♉️♊️♋️♌️♍️♎️♏️♐️♑️♒️♓️????⚛️????☢️☣️????????????????️????????????️✴️????????????㊙️㊗️????????????????????️????️????????????️????❌⭕️????⛔️????????????????♨️????????????????????????????❗️❕❓❔‼️⁉️????????〽️⚠️????????⚜️????♻️✅????️????❇️✳️❎????????Ⓜ️????????????????♿️????️????????️????????????????????????????????????????????????????ℹ️????????????????????????????????????0️⃣1️⃣2️⃣3️⃣4️⃣5️⃣6️⃣7️⃣8️⃣9️⃣????????#️⃣*️⃣▶️⏸⏯⏹⏺⏭⏮⏩⏪⏫⏬◀️????????➡️⬅️⬆️⬇️↗️↘️↙️↖️↕️↔️↪️↩️⤴️⤵️????????????????????????????➕➖➗✖️????????™️©️®️〰️➰➿????????????????????✔️☑️????⚪️⚫️????????????????????????????????????????▪️▫️◾️◽️◼️◻️⬛️⬜️????????????????????????????????????‍????????????????♠️♣️♥️♦️????????????️????????????????????????????????????????????????????????????????????????????????????????????????????️????????????????️‍????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????⚽️????????⚾️????????????????????????????????????????⛳️????????????????⛸????⛷????????️‍♀️????????‍♀️????????‍♀️????????‍♀️????????‍♀️????????‍♀️????️????????????????????????????????????????????????‍♀️????‍♂️????‍♀️????????‍♀️????????‍♀️????????‍♀️????????‍♀️????????‍♀️????‍♂️????????‍♂️????????‍♂️????????‍♂️????????‍♂️????????‍♂️⛹️‍♀️⛹????‍♀️⛹????‍♀️⛹????‍♀️⛹????‍♀️⛹????‍♀️⛹️⛹????⛹????⛹????⛹????⛹????????‍♀️????????‍♀️????????‍♀️????????‍♀️????????‍♀️????????‍♀️????‍♂️????????‍♂️????????‍♂️????????‍♂️????????‍♂️????????‍♂️????️‍♀️????????‍♀️????????‍♀️????????‍♀️????????‍♀️????????‍♀️????️????????????????????????????????????????????‍♀️????????‍♀️????????‍♀️????????‍♀️????????‍♀️????????‍♀️????????????????????????????????????????????????‍♀️????????‍♀️????????‍♀️????????‍♀️????????‍♀️????????‍♀️????????????????????????????????????????????????‍♀️????????‍♀️????????‍♀️????????‍♀️????????‍♀️????????‍♀️????‍♂️????????‍♂️????????‍♂️????????‍♂️????????‍♂️????????‍♂️????‍♀️????????‍♀️????????‍♀️????????‍♀️????????‍♀️????????‍♀️????????????????????????????????????????????????????????????????????????????????????????????‍♀️????????‍♀️????????‍♀️????????‍♀️????????‍♀️????????‍♀️????????????????????????????????????????????????‍♀️????????‍♀️????????‍♀️????????‍♀️????????‍♀️????????‍♀️????????????????????????????????????????????????????????????????????????????????????????????????‍♀️????‍♂️????????????????????????????????????????????????????????????????????";

fonction pour tester les emojis

public void checkMatchingEmojis() {

    final Pattern pattern = Pattern.compile(sEmojiRegex);
    final Matcher matcher = pattern.matcher(sEmojiTest);
    int foundEmojiCount = 0;
    while (matcher.find()) {
        System.out.println("Full match: " + matcher.group(0));
        foundEmojiCount++;
    }
    System.out.println("*******************************************");
    System.out.println("Input Emoji count = 1627");
    System.out.println("Captured Emoji count = " + foundEmojiCount);
    System.out.println("*******************************************");

}

Ici est le Gist, testé sur tous les emojis unicode 10 

Merci à Kevin Scott pour avoir écrit un grand exemple

3
Sergey Chilingaryan

Vous pouvez également utiliser emoji4j library.

String emojiText = "A ????, ???? and a ???? became friends. For ????'s birthday party, they all had ????s, ????s, ????s and ????.";

EmojiUtils.removeAllEmojis(emojiText);//returns "A ,  and a  became friends. For 's birthday party, they all had s, s, s and .
3
Chaitanya

Juste pour utiliser regex pour le résoudre:

s = s.replaceAll("\\p{So}+", "");

Vous pouvez le trouver dans 

http://www.regular-expressions.info/unicode.html

https://docs.Oracle.com/javase/7/docs/api/Java/lang/Character.html#OTHER_SYMBOL


 enter image description here

3
Desgard_Duan

C’est ce que j’utilise pour supprimer les emojis et jusqu’à présent, il a été autorisé à autoriser tous les autres alphabets.

private static String remove_Emojis(String name)
{  

    //we will store all the letters in this array
    ArrayList<Character> nonEmoji = new ArrayList<>();

     // and when we rebuild the name we will put it in here
    String newName = "";


    // we are going to loop through checking each character to see if its an emoji or not
    for (int i = 0; i < name.length(); i++) 
     {

        if (Character.isLetterOrDigit(name.charAt(i)))
        {
            nonEmoji.add(name.charAt(i));
        } 

         else 
          {
             // this is just a 2nd check in case the other method didn't allow some letter
            if (Build.VERSION.SDK_INT > 18)
            {
                if (Character.isAlphabetic(name.charAt(i))) 
                {
                    nonEmoji.add(name.charAt(i));
                }
            }
        }


        if (name.charAt(i) == ' ')// may want to consider adding or '-' or '\''
        {
            nonEmoji.add(i);// just add it
        }

        if (name.charAt(i) == '@' && !name.contains(" "))// I put this in for email addresses
        {
            nonEmoji.add('@');
        }
    }

    // finally just loop through building it back out
    for (int i = 0; i < nonEmoji.size(); i++) {

        newName += nonEmoji.get(i);
    }

    return newName;
}
2
Andrew Moreau

Vous pouvez générer votre propre expression rationnelle chaque fois que la spécification change.
Cet outil (capture d'écran ici ). 

Pour le mode utf-8/32 (chaîne), mode développé: 

"     # Use the 'Mega-Conversion' tool to change into other syntaxes"
"     # -------------------------------------------------------------"
"     "
"     [#*0-9] \\x{FE0F} \\x{20E3}"
"  |  [\\x{A9}\\x{AE}\\x{203C}\\x{2049}\\x{2122}\\x{2139}\\x{2194}-\\x{2199}\\x{21A9}\\x{21AA}\\x{231A}\\x{231B}\\x{2328}\\x{23CF}\\x{23E9}-\\x{23F3}\\x{23F8}-\\x{23FA}\\x{24C2}\\x{25AA}\\x{25AB}\\x{25B6}\\x{25C0}\\x{25FB}-\\x{25FE}\\x{2600}-\\x{2604}\\x{260E}\\x{2611}\\x{2614}\\x{2615}\\x{2618}]"
"  |  \\x{261D} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{2620}\\x{2622}\\x{2623}\\x{2626}\\x{262A}\\x{262E}\\x{262F}\\x{2638}-\\x{263A}\\x{2640}\\x{2642}\\x{2648}-\\x{2653}\\x{265F}\\x{2660}\\x{2663}\\x{2665}\\x{2666}\\x{2668}\\x{267B}\\x{267E}\\x{267F}\\x{2692}-\\x{2697}\\x{2699}\\x{269B}\\x{269C}\\x{26A0}\\x{26A1}\\x{26AA}\\x{26AB}\\x{26B0}\\x{26B1}\\x{26BD}\\x{26BE}\\x{26C4}\\x{26C5}\\x{26C8}\\x{26CE}\\x{26CF}\\x{26D1}\\x{26D3}\\x{26D4}\\x{26E9}\\x{26EA}\\x{26F0}-\\x{26F5}\\x{26F7}\\x{26F8}]"
"  |  \\x{26F9}"
"     (?:"
"          \\x{FE0F} \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{26FA}\\x{26FD}\\x{2702}\\x{2705}\\x{2708}\\x{2709}]"
"  |  [\\x{270A}-\\x{270D}] [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{270F}\\x{2712}\\x{2714}\\x{2716}\\x{271D}\\x{2721}\\x{2728}\\x{2733}\\x{2734}\\x{2744}\\x{2747}\\x{274C}\\x{274E}\\x{2753}-\\x{2755}\\x{2757}\\x{2763}\\x{2764}\\x{2795}-\\x{2797}\\x{27A1}\\x{27B0}\\x{27BF}\\x{2934}\\x{2935}\\x{2B05}-\\x{2B07}\\x{2B1B}\\x{2B1C}\\x{2B50}\\x{2B55}\\x{3030}\\x{303D}\\x{3297}\\x{3299}\\x{1F004}\\x{1F0CF}\\x{1F170}\\x{1F171}\\x{1F17E}\\x{1F17F}\\x{1F18E}\\x{1F191}-\\x{1F19A}]"
"  |  \\x{1F1E6} [\\x{1F1E8}-\\x{1F1EC}\\x{1F1EE}\\x{1F1F1}\\x{1F1F2}\\x{1F1F4}\\x{1F1F6}-\\x{1F1FA}\\x{1F1FC}\\x{1F1FD}\\x{1F1FF}]"
"  |  \\x{1F1E7} [\\x{1F1E6}\\x{1F1E7}\\x{1F1E9}-\\x{1F1EF}\\x{1F1F1}-\\x{1F1F4}\\x{1F1F6}-\\x{1F1F9}\\x{1F1FB}\\x{1F1FC}\\x{1F1FE}\\x{1F1FF}]"
"  |  \\x{1F1E8} [\\x{1F1E6}\\x{1F1E8}\\x{1F1E9}\\x{1F1EB}-\\x{1F1EE}\\x{1F1F0}-\\x{1F1F5}\\x{1F1F7}\\x{1F1FA}-\\x{1F1FF}]"
"  |  \\x{1F1E9} [\\x{1F1EA}\\x{1F1EC}\\x{1F1EF}\\x{1F1F0}\\x{1F1F2}\\x{1F1F4}\\x{1F1FF}]"
"  |  \\x{1F1EA} [\\x{1F1E6}\\x{1F1E8}\\x{1F1EA}\\x{1F1EC}\\x{1F1ED}\\x{1F1F7}-\\x{1F1FA}]"
"  |  \\x{1F1EB} [\\x{1F1EE}-\\x{1F1F0}\\x{1F1F2}\\x{1F1F4}\\x{1F1F7}]"
"  |  \\x{1F1EC} [\\x{1F1E6}\\x{1F1E7}\\x{1F1E9}-\\x{1F1EE}\\x{1F1F1}-\\x{1F1F3}\\x{1F1F5}-\\x{1F1FA}\\x{1F1FC}\\x{1F1FE}]"
"  |  \\x{1F1ED} [\\x{1F1F0}\\x{1F1F2}\\x{1F1F3}\\x{1F1F7}\\x{1F1F9}\\x{1F1FA}]"
"  |  \\x{1F1EE} [\\x{1F1E8}-\\x{1F1EA}\\x{1F1F1}-\\x{1F1F4}\\x{1F1F6}-\\x{1F1F9}]"
"  |  \\x{1F1EF} [\\x{1F1EA}\\x{1F1F2}\\x{1F1F4}\\x{1F1F5}]"
"  |  \\x{1F1F0} [\\x{1F1EA}\\x{1F1EC}-\\x{1F1EE}\\x{1F1F2}\\x{1F1F3}\\x{1F1F5}\\x{1F1F7}\\x{1F1FC}\\x{1F1FE}\\x{1F1FF}]"
"  |  \\x{1F1F1} [\\x{1F1E6}-\\x{1F1E8}\\x{1F1EE}\\x{1F1F0}\\x{1F1F7}-\\x{1F1FB}\\x{1F1FE}]"
"  |  \\x{1F1F2} [\\x{1F1E6}\\x{1F1E8}-\\x{1F1ED}\\x{1F1F0}-\\x{1F1FF}]"
"  |  \\x{1F1F3} [\\x{1F1E6}\\x{1F1E8}\\x{1F1EA}-\\x{1F1EC}\\x{1F1EE}\\x{1F1F1}\\x{1F1F4}\\x{1F1F5}\\x{1F1F7}\\x{1F1FA}\\x{1F1FF}]"
"  |  \\x{1F1F4} \\x{1F1F2}"
"  |  \\x{1F1F5} [\\x{1F1E6}\\x{1F1EA}-\\x{1F1ED}\\x{1F1F0}-\\x{1F1F3}\\x{1F1F7}-\\x{1F1F9}\\x{1F1FC}\\x{1F1FE}]"
"  |  \\x{1F1F6} \\x{1F1E6}"
"  |  \\x{1F1F7} [\\x{1F1EA}\\x{1F1F4}\\x{1F1F8}\\x{1F1FA}\\x{1F1FC}]"
"  |  \\x{1F1F8} [\\x{1F1E6}-\\x{1F1EA}\\x{1F1EC}-\\x{1F1F4}\\x{1F1F7}-\\x{1F1F9}\\x{1F1FB}\\x{1F1FD}-\\x{1F1FF}]"
"  |  \\x{1F1F9} [\\x{1F1E6}\\x{1F1E8}\\x{1F1E9}\\x{1F1EB}-\\x{1F1ED}\\x{1F1EF}-\\x{1F1F4}\\x{1F1F7}\\x{1F1F9}\\x{1F1FB}\\x{1F1FC}\\x{1F1FF}]"
"  |  \\x{1F1FA} [\\x{1F1E6}\\x{1F1EC}\\x{1F1F2}\\x{1F1F3}\\x{1F1F8}\\x{1F1FE}\\x{1F1FF}]"
"  |  \\x{1F1FB} [\\x{1F1E6}\\x{1F1E8}\\x{1F1EA}\\x{1F1EC}\\x{1F1EE}\\x{1F1F3}\\x{1F1FA}]"
"  |  \\x{1F1FC} [\\x{1F1EB}\\x{1F1F8}]"
"  |  \\x{1F1FD} \\x{1F1F0}"
"  |  \\x{1F1FE} [\\x{1F1EA}\\x{1F1F9}]"
"  |  \\x{1F1FF} [\\x{1F1E6}\\x{1F1F2}\\x{1F1FC}]"
"  |  [\\x{1F201}\\x{1F202}\\x{1F21A}\\x{1F22F}\\x{1F232}-\\x{1F23A}\\x{1F250}\\x{1F251}\\x{1F300}-\\x{1F321}\\x{1F324}-\\x{1F384}]"
"  |  \\x{1F385} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F386}-\\x{1F393}\\x{1F396}\\x{1F397}\\x{1F399}-\\x{1F39B}\\x{1F39E}-\\x{1F3C1}]"
"  |  \\x{1F3C2} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F3C3}\\x{1F3C4}]"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F3C5}\\x{1F3C6}]"
"  |  \\x{1F3C7} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F3C8}\\x{1F3C9}]"
"  |  \\x{1F3CA}"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F3CB}\\x{1F3CC}]"
"     (?:"
"          \\x{FE0F} \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F3CD}-\\x{1F3F0}]"
"  |  \\x{1F3F3}"
"     (?: \\x{FE0F} \\x{200D} \\x{1F308} )?"
"  |  \\x{1F3F4}"
"     (?:"
"          \\x{200D} \\x{2620} \\x{FE0F}"
"       |  \\x{E0067} \\x{E0062}"
"          (?:"
"               \\x{E0065} \\x{E006E} \\x{E0067}"
"            |  \\x{E0073} \\x{E0063} \\x{E0074}"
"            |  \\x{E0077} \\x{E006C} \\x{E0073}"
"          )"
"          \\x{E007F}"
"     )?"
"  |  [\\x{1F3F5}\\x{1F3F7}-\\x{1F440}]"
"  |  \\x{1F441}"
"     (?: \\x{FE0F} \\x{200D} \\x{1F5E8} \\x{FE0F} )?"
"  |  [\\x{1F442}\\x{1F443}] [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F444}\\x{1F445}]"
"  |  [\\x{1F446}-\\x{1F450}] [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F451}-\\x{1F465}]"
"  |  [\\x{1F466}\\x{1F467}] [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  \\x{1F468}"
"     (?:"
"          \\x{200D}"
"          (?:"
"               [\\x{2695}\\x{2696}\\x{2708}] \\x{FE0F}"
"            |  \\x{2764} \\x{FE0F} \\x{200D}"
"               (?: \\x{1F48B} \\x{200D} )?"
"               \\x{1F468}"
"            |  [\\x{1F33E}\\x{1F373}\\x{1F393}\\x{1F3A4}\\x{1F3A8}\\x{1F3EB}\\x{1F3ED}]"
"            |  \\x{1F466}"
"               (?: \\x{200D} \\x{1F466} )?"
"            |  \\x{1F467}"
"               (?: \\x{200D} [\\x{1F466}\\x{1F467}] )?"
"            |  [\\x{1F468}\\x{1F469}] \\x{200D}"
"               (?:"
"                    \\x{1F466}"
"                    (?: \\x{200D} \\x{1F466} )?"
"                 |  \\x{1F467}"
"                    (?: \\x{200D} [\\x{1F466}\\x{1F467}] )?"
"               )"
"            |  [\\x{1F4BB}\\x{1F4BC}\\x{1F527}\\x{1F52C}\\x{1F680}\\x{1F692}\\x{1F9B0}-\\x{1F9B3}]"
"          )"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?:"
"               \\x{200D}"
"               (?:"
"                    [\\x{2695}\\x{2696}\\x{2708}] \\x{FE0F}"
"                 |  [\\x{1F33E}\\x{1F373}\\x{1F393}\\x{1F3A4}\\x{1F3A8}\\x{1F3EB}\\x{1F3ED}\\x{1F4BB}\\x{1F4BC}\\x{1F527}\\x{1F52C}\\x{1F680}\\x{1F692}\\x{1F9B0}-\\x{1F9B3}]"
"               )"
"          )?"
"     )?"
"  |  \\x{1F469}"
"     (?:"
"          \\x{200D}"
"          (?:"
"               [\\x{2695}\\x{2696}\\x{2708}] \\x{FE0F}"
"            |  \\x{2764} \\x{FE0F} \\x{200D}"
"               (?: \\x{1F48B} \\x{200D} )?"
"               [\\x{1F468}\\x{1F469}]"
"            |  [\\x{1F33E}\\x{1F373}\\x{1F393}\\x{1F3A4}\\x{1F3A8}\\x{1F3EB}\\x{1F3ED}]"
"            |  \\x{1F466}"
"               (?: \\x{200D} \\x{1F466} )?"
"            |  \\x{1F467}"
"               (?: \\x{200D} [\\x{1F466}\\x{1F467}] )?"
"            |  \\x{1F469} \\x{200D}"
"               (?:"
"                    \\x{1F466}"
"                    (?: \\x{200D} \\x{1F466} )?"
"                 |  \\x{1F467}"
"                    (?: \\x{200D} [\\x{1F466}\\x{1F467}] )?"
"               )"
"            |  [\\x{1F4BB}\\x{1F4BC}\\x{1F527}\\x{1F52C}\\x{1F680}\\x{1F692}\\x{1F9B0}-\\x{1F9B3}]"
"          )"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?:"
"               \\x{200D}"
"               (?:"
"                    [\\x{2695}\\x{2696}\\x{2708}] \\x{FE0F}"
"                 |  [\\x{1F33E}\\x{1F373}\\x{1F393}\\x{1F3A4}\\x{1F3A8}\\x{1F3EB}\\x{1F3ED}\\x{1F4BB}\\x{1F4BC}\\x{1F527}\\x{1F52C}\\x{1F680}\\x{1F692}\\x{1F9B0}-\\x{1F9B3}]"
"               )"
"          )?"
"     )?"
"  |  [\\x{1F46A}-\\x{1F46D}]"
"  |  \\x{1F46E}"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  \\x{1F46F}"
"     (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"  |  \\x{1F470} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  \\x{1F471}"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  \\x{1F472} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  \\x{1F473}"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F474}-\\x{1F476}] [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  \\x{1F477}"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  \\x{1F478} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F479}-\\x{1F47B}]"
"  |  \\x{1F47C} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F47D}-\\x{1F480}]"
"  |  [\\x{1F481}\\x{1F482}]"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  \\x{1F483} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  \\x{1F484}"
"  |  \\x{1F485} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F486}\\x{1F487}]"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F488}-\\x{1F4A9}]"
"  |  \\x{1F4AA} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F4AB}-\\x{1F4FD}\\x{1F4FF}-\\x{1F53D}\\x{1F549}-\\x{1F54E}\\x{1F550}-\\x{1F567}\\x{1F56F}\\x{1F570}\\x{1F573}]"
"  |  \\x{1F574} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  \\x{1F575}"
"     (?:"
"          \\x{FE0F} \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F576}-\\x{1F579}]"
"  |  \\x{1F57A} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F587}\\x{1F58A}-\\x{1F58D}]"
"  |  [\\x{1F590}\\x{1F595}\\x{1F596}] [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F5A4}\\x{1F5A5}\\x{1F5A8}\\x{1F5B1}\\x{1F5B2}\\x{1F5BC}\\x{1F5C2}-\\x{1F5C4}\\x{1F5D1}-\\x{1F5D3}\\x{1F5DC}-\\x{1F5DE}\\x{1F5E1}\\x{1F5E3}\\x{1F5E8}\\x{1F5EF}\\x{1F5F3}\\x{1F5FA}-\\x{1F644}]"
"  |  [\\x{1F645}-\\x{1F647}]"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F648}-\\x{1F64A}]"
"  |  \\x{1F64B}"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  \\x{1F64C} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F64D}\\x{1F64E}]"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  \\x{1F64F} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F680}-\\x{1F6A2}]"
"  |  \\x{1F6A3}"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F6A4}-\\x{1F6B3}]"
"  |  [\\x{1F6B4}-\\x{1F6B6}]"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F6B7}-\\x{1F6BF}]"
"  |  \\x{1F6C0} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F6C1}-\\x{1F6C5}\\x{1F6CB}]"
"  |  \\x{1F6CC} [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F6CD}-\\x{1F6D2}\\x{1F6E0}-\\x{1F6E5}\\x{1F6E9}\\x{1F6EB}\\x{1F6EC}\\x{1F6F0}\\x{1F6F3}-\\x{1F6F9}\\x{1F910}-\\x{1F917}]"
"  |  [\\x{1F918}-\\x{1F91C}] [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  \\x{1F91D}"
"  |  [\\x{1F91E}\\x{1F91F}] [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  [\\x{1F920}-\\x{1F925}]"
"  |  \\x{1F926}"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F927}-\\x{1F92F}]"
"  |  [\\x{1F930}-\\x{1F936}] [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  \\x{1F937}"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F938}\\x{1F939}]"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  \\x{1F93A}"
"  |  \\x{1F93C}"
"     (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"  |  [\\x{1F93D}\\x{1F93E}]"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F940}-\\x{1F945}\\x{1F947}-\\x{1F970}\\x{1F973}-\\x{1F976}\\x{1F97A}\\x{1F97C}-\\x{1F9A2}\\x{1F9B0}-\\x{1F9B4}]"
"  |  [\\x{1F9B5}\\x{1F9B6}] [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  \\x{1F9B7}"
"  |  [\\x{1F9B8}\\x{1F9B9}]"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F9C0}-\\x{1F9C2}\\x{1F9D0}]"
"  |  [\\x{1F9D1}-\\x{1F9D5}] [\\x{1F3FB}-\\x{1F3FF}]?"
"  |  \\x{1F9D6}"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F9D7}-\\x{1F9DD}]"
"     (?:"
"          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
"       |  [\\x{1F3FB}-\\x{1F3FF}]"
"          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"     )?"
"  |  [\\x{1F9DE}\\x{1F9DF}]"
"     (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
"  |  [\\x{1F9E0}-\\x{1F9FF}]"

Pour le mode utf-16 (chaîne), mode compressé: 

"[#*0-9]\\uFE0F\\u20E3|[\\u00A9\\u00AE\\u203C\\u2049\\u2122\\u2139\\u2"
"194-\\u2199\\u21A9\\u21AA\\u231A\\u231B\\u2328\\u23CF\\u23E9-\\u23F3\\"
"u23F8-\\u23FA\\u24C2\\u25AA\\u25AB\\u25B6\\u25C0\\u25FB-\\u25FE\\u260"
"0-\\u2604\\u260E\\u2611\\u2614\\u2615\\u2618]|\\u261D(?:\\uD83C[\\uDF"
"FB-\\uDFFF])?|[\\u2620\\u2622\\u2623\\u2626\\u262A\\u262E\\u262F\\u26"
"38-\\u263A\\u2640\\u2642\\u2648-\\u2653\\u265F\\u2660\\u2663\\u2665\\u"
"2666\\u2668\\u267B\\u267E\\u267F\\u2692-\\u2697\\u2699\\u269B\\u269C\\"
"u26A0\\u26A1\\u26AA\\u26AB\\u26B0\\u26B1\\u26BD\\u26BE\\u26C4\\u26C5\\"
"u26C8\\u26CE\\u26CF\\u26D1\\u26D3\\u26D4\\u26E9\\u26EA\\u26F0-\\u26F5"
"\\u26F7\\u26F8]|\\u26F9(?:\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640"
"\\u2642]\\uFE0F)?|\\uFE0F\\u200D[\\u2640\\u2642]\\uFE0F)?|[\\u26FA\\u"
"26FD\\u2702\\u2705\\u2708\\u2709]|[\\u270A-\\u270D](?:\\uD83C[\\uDFF"
"B-\\uDFFF])?|[\\u270F\\u2712\\u2714\\u2716\\u271D\\u2721\\u2728\\u273"
"3\\u2734\\u2744\\u2747\\u274C\\u274E\\u2753-\\u2755\\u2757\\u2763\\u27"
"64\\u2795-\\u2797\\u27A1\\u27B0\\u27BF\\u2934\\u2935\\u2B05-\\u2B07\\u"
"2B1B\\u2B1C\\u2B50\\u2B55\\u3030\\u303D\\u3297\\u3299]|\\uD83C(?:[\\u"
"DC04\\uDCCF\\uDD70\\uDD71\\uDD7E\\uDD7F\\uDD8E\\uDD91-\\uDD9A]|\\uDDE"
"6\\uD83C[\\uDDE8-\\uDDEC\\uDDEE\\uDDF1\\uDDF2\\uDDF4\\uDDF6-\\uDDFA\\u"
"DDFC\\uDDFD\\uDDFF]|\\uDDE7\\uD83C[\\uDDE6\\uDDE7\\uDDE9-\\uDDEF\\uDD"
"F1-\\uDDF4\\uDDF6-\\uDDF9\\uDDFB\\uDDFC\\uDDFE\\uDDFF]|\\uDDE8\\uD83C"
"[\\uDDE6\\uDDE8\\uDDE9\\uDDEB-\\uDDEE\\uDDF0-\\uDDF5\\uDDF7\\uDDFA-\\u"
"DDFF]|\\uDDE9\\uD83C[\\uDDEA\\uDDEC\\uDDEF\\uDDF0\\uDDF2\\uDDF4\\uDDF"
"F]|\\uDDEA\\uD83C[\\uDDE6\\uDDE8\\uDDEA\\uDDEC\\uDDED\\uDDF7-\\uDDFA]"
"|\\uDDEB\\uD83C[\\uDDEE-\\uDDF0\\uDDF2\\uDDF4\\uDDF7]|\\uDDEC\\uD83C["
"\\uDDE6\\uDDE7\\uDDE9-\\uDDEE\\uDDF1-\\uDDF3\\uDDF5-\\uDDFA\\uDDFC\\uD"
"DFE]|\\uDDED\\uD83C[\\uDDF0\\uDDF2\\uDDF3\\uDDF7\\uDDF9\\uDDFA]|\\uDD"
"EE\\uD83C[\\uDDE8-\\uDDEA\\uDDF1-\\uDDF4\\uDDF6-\\uDDF9]|\\uDDEF\\uD8"
"3C[\\uDDEA\\uDDF2\\uDDF4\\uDDF5]|\\uDDF0\\uD83C[\\uDDEA\\uDDEC-\\uDDE"
"E\\uDDF2\\uDDF3\\uDDF5\\uDDF7\\uDDFC\\uDDFE\\uDDFF]|\\uDDF1\\uD83C[\\u"
"DDE6-\\uDDE8\\uDDEE\\uDDF0\\uDDF7-\\uDDFB\\uDDFE]|\\uDDF2\\uD83C[\\uD"
"DE6\\uDDE8-\\uDDED\\uDDF0-\\uDDFF]|\\uDDF3\\uD83C[\\uDDE6\\uDDE8\\uDD"
"EA-\\uDDEC\\uDDEE\\uDDF1\\uDDF4\\uDDF5\\uDDF7\\uDDFA\\uDDFF]|\\uDDF4\\"
"uD83C\\uDDF2|\\uDDF5\\uD83C[\\uDDE6\\uDDEA-\\uDDED\\uDDF0-\\uDDF3\\uD"
"DF7-\\uDDF9\\uDDFC\\uDDFE]|\\uDDF6\\uD83C\\uDDE6|\\uDDF7\\uD83C[\\uDD"
"EA\\uDDF4\\uDDF8\\uDDFA\\uDDFC]|\\uDDF8\\uD83C[\\uDDE6-\\uDDEA\\uDDEC"
"-\\uDDF4\\uDDF7-\\uDDF9\\uDDFB\\uDDFD-\\uDDFF]|\\uDDF9\\uD83C[\\uDDE6"
"\\uDDE8\\uDDE9\\uDDEB-\\uDDED\\uDDEF-\\uDDF4\\uDDF7\\uDDF9\\uDDFB\\uDD"
"FC\\uDDFF]|\\uDDFA\\uD83C[\\uDDE6\\uDDEC\\uDDF2\\uDDF3\\uDDF8\\uDDFE\\"
"uDDFF]|\\uDDFB\\uD83C[\\uDDE6\\uDDE8\\uDDEA\\uDDEC\\uDDEE\\uDDF3\\uDD"
"FA]|\\uDDFC\\uD83C[\\uDDEB\\uDDF8]|\\uDDFD\\uD83C\\uDDF0|\\uDDFE\\uD8"
"3C[\\uDDEA\\uDDF9]|\\uDDFF\\uD83C[\\uDDE6\\uDDF2\\uDDFC]|[\\uDE01\\uD"
"E02\\uDE1A\\uDE2F\\uDE32-\\uDE3A\\uDE50\\uDE51\\uDF00-\\uDF21\\uDF24-"
"\\uDF84]|\\uDF85(?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDF86-\\uDF93\\uDF9"
"6\\uDF97\\uDF99-\\uDF9B\\uDF9E-\\uDFC1]|\\uDFC2(?:\\uD83C[\\uDFFB-\\u"
"DFFF])?|[\\uDFC3\\uDFC4](?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\"
"uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDFC5\\uDFC6"
"]|\\uDFC7(?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDFC8\\uDFC9]|\\uDFCA(?:\\"
"u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2"
"640\\u2642]\\uFE0F)?)?|[\\uDFCB\\uDFCC](?:\\uD83C[\\uDFFB-\\uDFFF]("
"?:\\u200D[\\u2640\\u2642]\\uFE0F)?|\\uFE0F\\u200D[\\u2640\\u2642]\\uF"
"E0F)?|[\\uDFCD-\\uDFF0]|\\uDFF3(?:\\uFE0F\\u200D\\uD83C\\uDF08)?|\\u"
"DFF4(?:\\u200D\\u2620\\uFE0F|\\uDB40\\uDC67\\uDB40\\uDC62\\uDB40(?:\\"
"uDC65\\uDB40\\uDC6E\\uDB40\\uDC67|\\uDC73\\uDB40\\uDC63\\uDB40\\uDC74"
"|\\uDC77\\uDB40\\uDC6C\\uDB40\\uDC73)\\uDB40\\uDC7F)?|[\\uDFF5\\uDFF7"
"-\\uDFFF])|\\uD83D(?:[\\uDC00-\\uDC40]|\\uDC41(?:\\uFE0F\\u200D\\uD8"
"3D\\uDDE8\\uFE0F)?|[\\uDC42\\uDC43](?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\"
"uDC44\\uDC45]|[\\uDC46-\\uDC50](?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDC"
"51-\\uDC65]|[\\uDC66\\uDC67](?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDC68(?"
":\\u200D(?:[\\u2695\\u2696\\u2708]\\uFE0F|\\u2764\\uFE0F\\u200D\\uD83"
"D(?:\\uDC8B\\u200D\\uD83D)?\\uDC68|\\uD83C[\\uDF3E\\uDF73\\uDF93\\uDF"
"A4\\uDFA8\\uDFEB\\uDFED]|\\uD83D(?:\\uDC66(?:\\u200D\\uD83D\\uDC66)?"
"|\\uDC67(?:\\u200D\\uD83D[\\uDC66\\uDC67])?|[\\uDC68\\uDC69]\\u200D\\"
"uD83D(?:\\uDC66(?:\\u200D\\uD83D\\uDC66)?|\\uDC67(?:\\u200D\\uD83D["
"\\uDC66\\uDC67])?)|[\\uDCBB\\uDCBC\\uDD27\\uDD2C\\uDE80\\uDE92])|\\uD"
"83E[\\uDDB0-\\uDDB3])|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D(?:[\\u2695"
"\\u2696\\u2708]\\uFE0F|\\uD83C[\\uDF3E\\uDF73\\uDF93\\uDFA4\\uDFA8\\uD"
"FEB\\uDFED]|\\uD83D[\\uDCBB\\uDCBC\\uDD27\\uDD2C\\uDE80\\uDE92]|\\uD8"
"3E[\\uDDB0-\\uDDB3]))?)?|\\uDC69(?:\\u200D(?:[\\u2695\\u2696\\u2708"
"]\\uFE0F|\\u2764\\uFE0F\\u200D\\uD83D(?:\\uDC8B\\u200D\\uD83D)?[\\uDC"
"68\\uDC69]|\\uD83C[\\uDF3E\\uDF73\\uDF93\\uDFA4\\uDFA8\\uDFEB\\uDFED]"
"|\\uD83D(?:\\uDC66(?:\\u200D\\uD83D\\uDC66)?|\\uDC67(?:\\u200D\\uD83"
"D[\\uDC66\\uDC67])?|\\uDC69\\u200D\\uD83D(?:\\uDC66(?:\\u200D\\uD83D"
"\\uDC66)?|\\uDC67(?:\\u200D\\uD83D[\\uDC66\\uDC67])?)|[\\uDCBB\\uDCB"
"C\\uDD27\\uDD2C\\uDE80\\uDE92])|\\uD83E[\\uDDB0-\\uDDB3])|\\uD83C[\\u"
"DFFB-\\uDFFF](?:\\u200D(?:[\\u2695\\u2696\\u2708]\\uFE0F|\\uD83C[\\u"
"DF3E\\uDF73\\uDF93\\uDFA4\\uDFA8\\uDFEB\\uDFED]|\\uD83D[\\uDCBB\\uDCB"
"C\\uDD27\\uDD2C\\uDE80\\uDE92]|\\uD83E[\\uDDB0-\\uDDB3]))?)?|[\\uDC6"
"A-\\uDC6D]|\\uDC6E(?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-"
"\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|\\uDC6F(?:\\u200D[\\u2"
"640\\u2642]\\uFE0F)?|\\uDC70(?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDC71(?"
":\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\"
"u2640\\u2642]\\uFE0F)?)?|\\uDC72(?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDC"
"73(?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u20"
"0D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDC74-\\uDC76](?:\\uD83C[\\uDFFB-\\"
"uDFFF])?|\\uDC77(?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\"
"uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|\\uDC78(?:\\uD83C[\\uDF"
"FB-\\uDFFF])?|[\\uDC79-\\uDC7B]|\\uDC7C(?:\\uD83C[\\uDFFB-\\uDFFF])"
"?|[\\uDC7D-\\uDC80]|[\\uDC81\\uDC82](?:\\u200D[\\u2640\\u2642]\\uFE0"
"F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|\\uD"
"C83(?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDC84|\\uDC85(?:\\uD83C[\\uDFFB-"
"\\uDFFF])?|[\\uDC86\\uDC87](?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C"
"[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDC88-\\uD"
"CA9]|\\uDCAA(?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDCAB-\\uDCFD\\uDCFF-\\"
"uDD3D\\uDD49-\\uDD4E\\uDD50-\\uDD67\\uDD6F\\uDD70\\uDD73]|\\uDD74(?:"
"\\uD83C[\\uDFFB-\\uDFFF])?|\\uDD75(?:\\uD83C[\\uDFFB-\\uDFFF](?:\\u2"
"00D[\\u2640\\u2642]\\uFE0F)?|\\uFE0F\\u200D[\\u2640\\u2642]\\uFE0F)?"
"|[\\uDD76-\\uDD79]|\\uDD7A(?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDD87\\uD"
"D8A-\\uDD8D]|[\\uDD90\\uDD95\\uDD96](?:\\uD83C[\\uDFFB-\\uDFFF])?|["
"\\uDDA4\\uDDA5\\uDDA8\\uDDB1\\uDDB2\\uDDBC\\uDDC2-\\uDDC4\\uDDD1-\\uDD"
"D3\\uDDDC-\\uDDDE\\uDDE1\\uDDE3\\uDDE8\\uDDEF\\uDDF3\\uDDFA-\\uDE44]|"
"[\\uDE45-\\uDE47](?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\"
"uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDE48-\\uDE4A]|\\uDE"
"4B(?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u20"
"0D[\\u2640\\u2642]\\uFE0F)?)?|\\uDE4C(?:\\uD83C[\\uDFFB-\\uDFFF])?|"
"[\\uDE4D\\uDE4E](?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\u"
"DFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|\\uDE4F(?:\\uD83C[\\uDFF"
"B-\\uDFFF])?|[\\uDE80-\\uDEA2]|\\uDEA3(?:\\u200D[\\u2640\\u2642]\\uF"
"E0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|["
"\\uDEA4-\\uDEB3]|[\\uDEB4-\\uDEB6](?:\\u200D[\\u2640\\u2642]\\uFE0F|"
"\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDE"
"B7-\\uDEBF]|\\uDEC0(?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDEC1-\\uDEC5\\u"
"DECB]|\\uDECC(?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDECD-\\uDED2\\uDEE0-"
"\\uDEE5\\uDEE9\\uDEEB\\uDEEC\\uDEF0\\uDEF3-\\uDEF9])|\\uD83E(?:[\\uDD"
"10-\\uDD17]|[\\uDD18-\\uDD1C](?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDD1D|"
"[\\uDD1E\\uDD1F](?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDD20-\\uDD25]|\\uD"
"D26(?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u2"
"00D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDD27-\\uDD2F]|[\\uDD30-\\uDD36]("
"?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDD37(?:\\u200D[\\u2640\\u2642]\\uFE0"
"F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\u"
"DD38\\uDD39](?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFF"
"F](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|\\uDD3A|\\uDD3C(?:\\u200D[\\"
"u2640\\u2642]\\uFE0F)?|[\\uDD3D\\uDD3E](?:\\u200D[\\u2640\\u2642]\\u"
"FE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|"
"[\\uDD40-\\uDD45\\uDD47-\\uDD70\\uDD73-\\uDD76\\uDD7A\\uDD7C-\\uDDA2\\"
"uDDB0-\\uDDB4]|[\\uDDB5\\uDDB6](?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDDB"
"7|[\\uDDB8\\uDDB9](?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-"
"\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDDC0-\\uDDC2\\uDDD"
"0]|[\\uDDD1-\\uDDD5](?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDDD6(?:\\u200D"
"[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u"
"2642]\\uFE0F)?)?|[\\uDDD7-\\uDDDD](?:\\u200D[\\u2640\\u2642]\\uFE0F"
"|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\uD"
"DDE\\uDDDF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?|[\\uDDE0-\\uDDFF])"
0
sln

Regex est trop lent et Emoji est mis à jour très rapidement.

Essayez ce projet simple-emoji-4j

Compatible avec Emoji 12.0 (2018.10.15)

Simple avec:

EmojiUtils.containsEmoji(str)
0
liheyuan