Java emoji persistence mysql
I haven't updated my blog for a long time. Today I share with you a question about emoji expression persistence. I believe that web developers have encountered such a problem, because we know that MySQL utf-8 character set can't save emoji expression characters. Why? Because ordinary strings or expressions occupy 3 bytes, utf8 is enough, but mobile emoticons occupy 4 bytes. Ordinary utf8 is not enough. In order to cope with the opportunities and challenges of wireless Internet, avoid the problems caused by emoji emoticons, and involve wireless MySQL database. It is suggested that utf8mb4 character set should be adopted in advance, which must be the key point of technology selection in the mobile Internet industry.
Well, see the results above, have you changed the database character set? If you are a personal project or a small project, the above method is a solution, but for a system currently serving 5000W users, the above method is a little inappropriate. In view of this situation, I summarize three points. Here are some ways to deal with this problem.
1. Since the emoticons on the mobile end occupy 4 bytes, we directly convert and save the data.
1.URLEncoder.encode(String s, String enc)
Converting strings to application/x-www-form-urlencoded format using the specified encoding mechanism
URLDecoder.decode(String s, String enc)
Use the specified encoding mechanism to decode the application/x-www-form-urlencoded string.
2. The treatment of method one is too rough. Is there a better solution? Using lightweight tool emoji-java to process Emoji expression characters
github address: https://github.com/vdurmont/emoji-java
Specific ways of use, you can enter the git to see for yourself.
3. Are you satisfied with the above two methods? The emoji processing method that you most admire is the key point. First, we will talk about the problems of the above two ways: the first way is that the data is converted, which is equivalent to encryption, so we can not directly see the original content of the data. It will be very difficult for business scenarios that need to be searched. The second way, although avoiding the problems of the first way, matches and transforms them based on the expression control table, which means that for some new expressions, it can not be transformed, which will lead to the insertion of data. Continuous problems, this is its first problem, the second point is that it transforms expressions into corresponding matching rules, to put it plainly, into English descriptions, that is, this transformation, the original 4-byte expression, it may give you 10 bytes or more. Okay, so much. Let's take a look at my final solution.
Copy code
/**
- @Author: gaoshang
- @Description:
- @Date: 2019/7/19
*/
public class EmojiUtil {
/** * Convert the expression in the text to hexadecimal * <p> * * @param input * @return */ public static String parseFromAliases(String input) { if (input == null) { return input; } return stringToUnicode(input); } /** * Converting hexadecimal to expression in text * <p> * * @param input * @return */ public static String parseToAliases(String input) { if (input == null) { return input; } return unicodeToString(input); } /** * String rotation unicode * * @param str * @return */ public static String stringToUnicode(String str) { StringBuilder sb = new StringBuilder(); StringBuilder cacheSB = new StringBuilder(); char[] c = str.toCharArray(); for (int i = 0; i < c.length; i++) { if (!isEmojiCharacter(c[i])) { if (cacheSB.length() > 0) { sb.append("\\u").append(cacheSB); cacheSB.delete(0, cacheSB.length()); } sb.append("\\u").append("[").append(Integer.toHexString(c[i])).append("]"); } else { if (c[i] == '[' || c[i] == '\\' || c[i] == ']') { if (cacheSB.length() > 0) { sb.append("\\u").append(cacheSB); cacheSB.delete(0, cacheSB.length()); } sb.append("\\u").append(c[i]); } else { cacheSB.append(c[i]); } } } if (cacheSB.length() > 0) { if (sb.length() > 0) { sb.append("\\u"); } sb.append(cacheSB); } return sb.toString(); } /** * unicode String escalation * * @param unicode * @return */ public static String unicodeToString(String unicode) { StringBuilder sb = new StringBuilder(); String[] hex = unicode.split("\\\\u"); for (int i = 0; i < hex.length; i++) { if (hex[i].indexOf("[") == 0 && hex[i].indexOf("]") == hex[i].length() - 1) { try { int index = Integer.parseInt(hex[i].substring(1, hex[i].length() - 1), 16); sb.append((char) index); } catch (NumberFormatException e) { sb.append(hex[i]); } } else { sb.append(hex[i]); } } return sb.toString(); } private static boolean isEmojiCharacter(char codePoint) { return (codePoint == 0x0) || (codePoint == 0x9) || (codePoint == 0xA) || (codePoint == 0xD) || ((codePoint >= 0x20) && (codePoint <= 0xD7FF)) || ((codePoint >= 0xE000) && (codePoint <= 0xFFFD)) || ((codePoint >= 0x10000) && (codePoint <= 0x10FFFF)); }
}
Copy code
Well, let's start with that. Welcome to put forward different opinions and good solutions.
Original address https://www.cnblogs.com/AndroidJotting/p/11253202.html