汉字转换成Unicode编码PHP程序

下面来总结一些常用的汉字转换成Unicode编码PHP程序实现代码,我们只要了解到Unicode编码与gbk编码之间的内置转换原理即可了.

汉字转换成unicode方法,代码如下:

  1. <?php
  2. //将utf8编码的汉字转换为unicode
  3. function htou($c){
  4. $n = (ord($c[0]) & 0x1f) << 12;
  5. $n = (ord($c[1]) & 0x3f) << 6;
  6. $n = ord($c[2]) & 0x3f;
  7. return $n;
  8. }
  9. //在代码中隐藏utf8格式的字符串
  10. function my_utf8_unicode($str) {
  11. $encode='';
  12. for($i=0;$i<strlen($str);$i ){
  13. if(ord(substr($str,$i,1))> 0xa0){
  14. $encode.='&#'.htou(substr($str,$i,3)).';';
  15. $i =2;
  16. }else{
  17. $encode.='&#'.ord($str[$i]).';';
  18. }
  19. }
  20. return $encode;
  21. }
  22. echo my_utf8_unicode("哈哈ABC");
  23. ?>

汉字转换成unicode方法二,代码如下:

  1. function getUnicode($word)
  2. {
  3. // 转UTF8
  4. $word0 = iconv('gbk', 'utf-8', $word);
  5. $word1 = iconv('utf-8', 'gbk', $word0);
  6. $word = ($word1 == $word) ? $word0 : $word;
  7. // 拆分汉字
  8. preg_match_all('#(?:[x00-x7F]|[xC0-xFF][x80-xBF]+)#s', $word, $array, PREG_PATTERN_ORDER);
  9. $return = array();
  10. // 转换
  11. foreach ($array[0] as $cc)
  12. {
  13. $arr = str_split($cc);
  14. $bin_str = '';
  15. foreach ($arr as $value)
  16. {
  17. $bin_str .= decbin(ord($value));
  18. }
  19. $bin_str = preg_replace('/^.{4}(.{4}).{2}(.{6}).{2}(.{6})$/','$1$2$3', $bin_str);
  20. $return[] = '&#' . bindec($bin_str) . ';';
  21. }
  22. return implode('', $return);
  23. }

函数用法,代码如下:

  1. $word = '一个汉字转换成Unicode四字节编码的PHP函数。';
  2. echo getUnicode($word);
  3. /*
  4. 上述将输出如下结果:
  5. &#19968&#20010&#27721&#23383&#36716&#25442&#25104&#65333&#65358
  6. &#65353&#65347&#65359&#65348&#65349&#22235&#23383&#33410&#32534
  7. &#30721&#30340&#80&#72&#80&#20989&#25968&#12290
  8. */

这一组函数可以将汉字转成unicode编码,也可以将unicode解码成汉字.

将汉字转成Unicode的函数,代码如下:

  1. function uni_encode ($word)
  2. {
  3. $word0 = iconv('gbk', 'utf-8', $word);
  4. $word1 = iconv('utf-8', 'gbk', $word0);
  5. $word = ($word1 == $word) ? $word0 : $word;
  6. $word = json_encode($word);
  7. $word = preg_replace_callback('/\\u(w{4})/', create_function('$hex', 'return '&#'.hexdec($hex[1]).';';'), substr($word, 1, strlen($word)-2));
  8. return $word;
  9. }

对Unicode编码进行解码的函数,代码如下:

  1. function uni_decode ($uncode)
  2. {
  3. $word = json_decode(preg_replace_callback('/&#(d{5});/', create_function('$dec', 'return '\u'.dechex($dec[1]);'), '"'.$uncode.'"'));
  4. return $word;
  5. }