欢迎您访问vb uft-8转gb2312!

vb uft-8转gb2312

更新时间:2021-05-28 22:15:21作者:admin2

下面的内容转自我的百度空间,是我收集来的,在这里看起来如果觉得排版不好,可以直接看我的空间内的文章:http://hi.baidu.com/newkedison/blog/item/1c7d2c392cc192f63b87ce12.html有关UTF-8的一些资料2008年06月13日 星期五 08:17一, 最重要的,UTF-8和Unicode的转换UTF-8 编码是一种被广泛应用的编码,这种编码致力于把全球的语言纳入一个统一的编码,目前已经将几种亚洲语言纳入。UTF 代表 UCS Transformation Format. UTF-8 采用变长度字节来表示字符,理论上最多可以到 6 个字节长度。UTF-8 编码兼容了 ASC II(0-127), 也就是说 UTF-8 对于 ASC II 字符的编码是和 ASC II 一样的。对于超过一个字节长度的字符,才用以下编码规范: 左边第一个字节1的个数表示这个字符编码字节的位数,例如两位字节字符编码样式为为:110xxxxx 10xxxxxx; 三位字节字符的编码样式为:1110xxxx 10xxxxxx 10xxxxxx.;以此类推,六位字节字符的编码样式为:1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx。 xxx 的值由字符编码的二进制表示的位填入。只用最短的那个足够表达一个字符编码的多字节串。例如: Unicode 字符: 00 A9(版权符号) = 1010 1001, UTF-8 编码为:11000010 10101001 = 0x C2 0xA9; 字符 22 60 (不等于符号) = 0010 0010 0110 0000, UTF-8 编码为:11100010 10001001 10100000 = 0xE2 0x89 0xA0以上转换例子已经确认是正确的,不用怀疑,如果看不懂请再仔细想想Unicode编码和utf-8编码之间的对应关系表 The table below summarizes the format of these different octet types. The letter x indicates bits available for encoding bits of the character number.Char. number range | UTF-8 octet sequence (hexadecimal) | (binary) --------------------+--------------------------------------------- 0000 0000-0000 007F | 0xxxxxxx 0000 0080-0000 07FF | 110xxxxx 10xxxxxx 0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx //////A///////// 0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx这是一个Unicode编码和utf-8编码之间的对应关系表。中文的Unicode编码范围在0000 0800-0000 FFFF 中。二, 关于BOMUTF-8以字节为编码单元,没有字节序的问题。UTF-16以两个字节为编码单元,在解释一个UTF-16文本前,首先要弄清楚每个编码单元的字节序。例如收到一个“奎”的Unicode编码是594E,“乙”的Unicode编码是4E59。如果我们收到UTF-16字节流“594E”,那么这是“奎”还是“乙”? Unicode规范中推荐的标记字节顺序的方法是BOM。BOM不是“Bill Of Material”的BOM表,而是Byte Order Mark。BOM是一个有点小聪明的想法: 在UCS编码中有一个叫做"ZERO WIDTH NO-BREAK SPACE"的字符,它的编码是FEFF。而FFFE在UCS中是不存在的字符,所以不应该出现在实际传输中。UCS规范建议我们在传输字节流前,先传输字符"ZERO WIDTH NO-BREAK SPACE"。 这样如果接收者收到FEFF,就表明这个字节流是Big-Endian的;如果收到FFFE,就表明这个字节流是Little-Endian的。因此字符"ZERO WIDTH NO-BREAK SPACE"又被称作BOM。 UTF-8不需要BOM来表明字节顺序,但可以用BOM来表明编码方式。字符"ZERO WIDTH NO-BREAK SPACE"的UTF-8编码是EF BB BF(读者可以用我们前面介绍的编码方法验证一下)。所以如果接收者收到以EF BB BF开头的字节流,就知道这是UTF-8编码了。三, VB实现UTF-8转Unicode的函数1.不使用APIFunction Utf8ToUnicode(ByRef Utf() As Byte) As StringDim utfLen As LongutfLen = -1On Error Resume NextutfLen = UBound(Utf)If utfLen = -1 Then Exit FunctionOn Error GoTo 0Dim i As Long, j As Long, k As Long, N As LongDim B As Byte, cnt As ByteDim Buf() As StringReDim Buf(utfLen)i = 0j = 0Do While i <= utfLen B = Utf(i) If (B And &HFC) = &HFC Then cnt = 6 ElseIf (B And &HF8) = &HF8 Then cnt = 5 ElseIf (B And &HF0) = &HF0 Then cnt = 4 ElseIf (B And &HE0) = &HE0 Then cnt = 3 ElseIf (B And &HC0) = &HC0 Then cnt = 2 Else cnt = 1 End If If i + cnt - 1 > utfLen Then Buf(j) = "?" Exit Do End If Select Case cnt Case 2 N = B And &H1F Case 3 N = B And &HF Case 4 N = B And &H7 Case 5 N = B And &H3 Case 6 N = B And &H1 Case Else Buf(j) = Chr(B) GoTo Continued: End Select For k = 1 To cnt - 1 B = Utf(i + k) N = N * &H40 + (B And &H3F) Next Buf(j) = ChrW(N)Continued: i = i + cnt j = j + 1LoopUtf8ToUnicode = Join(Buf, "")End Function2. 使用API (包括Unicode转UTF-8)Private Declare Function WideCharToMultiByte Lib "kernel32" (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long, ByRef lpMultiByteStr As Any, ByVal cchMultiByte As Long, ByVal lpDefaultChar As String, ByVal lpUsedDefaultChar As Long) As LongPrivate Declare Function MultiByteToWideChar Lib "kernel32" (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As LongPrivate Const CP_UTF8 = 65001Function Utf8ToUnicode(ByRef Utf() As Byte) As StringDim lRet As LongDim lLength As LongDim lBufferSize As LonglLength = UBound(Utf) - LBound(Utf) + 1If lLength <= 0 Then Exit FunctionlBufferSize = lLength * 2Utf8ToUnicode = String$(lBufferSize, Chr(0))lRet = MultiByteToWideChar(CP_UTF8, 0, VarPtr(Utf(0)), lLength, StrPtr(Utf8ToUnicode), lBufferSize)If lRet <> 0 Then Utf8ToUnicode = Left(Utf8ToUnicode, lRet)End IfEnd FunctionFunction UnicodeToUtf8(ByVal UCS As String) As Byte()Dim lLength As LongDim lBufferSize As LongDim lResult As LongDim abUTF8() As BytelLength = Len(UCS)If lLength = 0 Then Exit FunctionlBufferSize = lLength * 3 + 1ReDim abUTF8(lBufferSize - 1)lResult = WideCharToMultiByte(CP_UTF8, 0, StrPtr(UCS), lLength, abUTF8(0), lBufferSize, vbNullString, 0)If lResult <> 0 ThenlResult = lResult - 1ReDim Preserve abUTF8(lResult)UnicodeToUtf8 = abUTF8End IfEnd FunctionPrivate Sub Command1_Click()Dim byt() As Bytebyt = UnicodeToUtf8("测试")Debug.Print Hex(byt(0)) & Hex(byt(1)) & Hex(byt(2))Debug.Print Utf8ToUnicode(byt()) End Sub

'复制下面文件到模块中'调用:Text1.Text = UTF8_Decode(UTF8Zfc)'注意:文件下载后直接转换,不能做任何其他转换(如strconv)。'***************模块代码********************'Utf8字符转化成Unicode字符定义Public Declare Function MultiByteToWideChar Lib "kernel32" (ByVal CodePage As Long, ByVal dwFlags As Long, ByRef lpMultiByteStr As Any, ByVal cchMultiByte As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As LongPublic Const CP_UTF8 = 65001'获得系统的类型定义Private Declare Function GetVersionExA Lib "kernel32" (lpVersionInformation As OSVERSIONINFO) As IntegerPrivate Type OSVERSIONINFO dwOSVersionInfoSize As Long dwMajorVersion As Long dwMinorVersion As Long dwBuildNumber As Long dwPlatformId As Long szCSDVersion As String * 128End Type'获得系统的类型Public Function GetVersion() As String Dim osinfo As OSVERSIONINFO Dim retvalue As Integer osinfo.dwOSVersionInfoSize = 148 osinfo.szCSDVersion = Space$(128) retvalue = GetVersionExA(osinfo) With osinfo Select Case .dwPlatformId Case 1 Select Case .dwMinorVersion Case 0 GetVersion = "1Windows 95" Case 10 GetVersion = "1Windows 98" Case 90 GetVersion = "1Windows Mellinnium" End Select Case 2 Select Case .dwMajorVersion Case 3 GetVersion = "2Windows NT 3.51" Case 4 GetVersion = "2Windows NT 4.0" Case 5 If .dwMinorVersion = 0 Then GetVersion = "2Windows 2000" Else GetVersion = "2Windows XP" End If End Select Case Else GetVersion = "Failed" End Select End WithEnd Function'功能: 把Utf8字符转化成Unicode字符Public Function UTF8_Decode(ByVal sUTF8 As String) As String Dim lngUtf8Size As Long Dim strBuffer As String Dim lngBufferSize As Long Dim lngResult As Long Dim bytUtf8() As Byte Dim n As Long If LenB(sUTF8) = 0 Then Exit Function If Left(GetVersion(), 1) = "2" Then On Error GoTo EndFunction 'bytUtf8 = StrConv(sUTF8, vbFromUnicode) bytUtf8 = sUTF8 lngUtf8Size = UBound(bytUtf8) + 1 On Error GoTo 0 'Set buffer for longest possible string i.e. each byte is 'ANSI, thus 1 unicode(2 bytes)for every utf-8 character. lngBufferSize = lngUtf8Size * 2 strBuffer = String$(lngBufferSize, vbNullChar) 'Translate using code page 65001(UTF-8) lngResult = MultiByteToWideChar(CP_UTF8, 0, bytUtf8(0), _ lngUtf8Size, StrPtr(strBuffer), lngBufferSize) 'Trim result to actual length If lngResult Then UTF8_Decode = Left(strBuffer, lngResult) End If Else Dim i As Long Dim TopIndex As Long Dim TwoBytes(1) As Byte Dim ThreeBytes(2) As Byte Dim AByte As Byte Dim TStr As String Dim BArray() As Byte 'Resume on error in case someone inputs text with accents 'that should have been encoded as UTF-8 On Error Resume Next TopIndex = LenB(sUTF8) ' Number of bytes equal TopIndex+1 If TopIndex = 0 Then Exit Function ' get out if there's nothing to convert 'BArray = StrConv(sUTF8, vbFromUnicode) BArray = sUTF8 i = 0 ' Initialise pointer TopIndex = TopIndex - 1 ' Iterate through the Byte Array Do While i <= TopIndex AByte = BArray(i) If AByte < &H80 Then ' Normal ANSI character - use it as is TStr = TStr & Chr$(AByte): i = i + 1 ' Increment byte array index ElseIf AByte >= &HE0 Then 'was = &HE1 Then ' Start of 3 byte UTF-8 group for a character ' Copy 3 byte to ThreeBytes ThreeBytes(0) = BArray(i): i = i + 1 ThreeBytes(1) = BArray(i): i = i + 1 ThreeBytes(2) = BArray(i): i = i + 1 ' Convert Byte array to UTF-16 then Unicode TStr = TStr & ChrW$((ThreeBytes(0) And &HF) * &H1000 + (ThreeBytes(1) And &H3F) * &H40 + (ThreeBytes(2) And &H3F)) ElseIf (AByte >= &HC2) And (AByte <= &HDB) Then ' Start of 2 byte UTF-8 group for a character TwoBytes(0) = BArray(i): i = i + 1 TwoBytes(1) = BArray(i): i = i + 1 ' Convert Byte array to UTF-16 then Unicode TStr = TStr & ChrW$((TwoBytes(0) And &H1F) * &H40 + (TwoBytes(1) And &H3F)) Else ' Normal ANSI character - use it as is TStr = TStr & Chr$(AByte): i = i + 1 ' Increment byte array index End If Loop UTF8_Decode = TStr ' Return the resultant string Erase BArray End IfEndFunction:End Function

为您推荐

新加坡留学的陪读政策怎样?

  6-16岁国内中小学生,母亲可陪读并工作   新加坡是一个社会治安良好、犯罪率极低、环境优雅的花园国家,也是非常适宜华人居住的国家。   新加坡留学生论坛表示:华人比率

2021-05-28 22:09

上海璇岳信息科技有限公司怎么样?

上海璇岳信息科技有限公司是2017-05-19在上海市崇明县注册成立的有限责任公司(自然人投资或控股),注册地址位于上海市崇明区陈家镇瀛东村53号3幢897室(上海智慧岛数据产业园)。

2021-05-28 22:09

我去新加坡留学,母亲陪读可以找工作吗?

可以的,但是陪读是有条件限制的,只有进入政府中小学而且低于16周岁,母亲才可以申请陪读,在陪读的第二年可以申请打工~ 根据母亲的学历以及英语水平不同,工作也是不同的,当然,薪水也

2021-05-28 22:01

_闱炲父鎶辨瓑,镇ㄦ墍璁块棶镄勯〉闱笉瀛桦湪,璇锋偍纭缃戝潃鏄惁姝g

_闱炲父鎶辨瓑,镇ㄦ墍璁块棶镄勯〉闱笉瀛桦湪,璇锋偍纭缃戝潃鏄惁姝g 30翻译简体中文 3555556333333333333 天津市宇璇机电安装有限公司怎么样? 天津市宇璇机电安装

2021-05-28 22:00

陪读政策是什么样的?

陪读政策是新政府为了吸引优秀的孩子到新加坡就读而设立的,很多家长认为陪读的学校仅局限于政府中小学及幼儿园; 专家解析:除政府中小学及幼儿园外,就读ITE或初级学院以及某些国

2021-05-28 22:00

如何通过留学移民新加坡

想要移民新加坡的学生,不妨选择去新加坡读硕士,因为新加坡硕士毕业移民留学签证成功几率极高,一般情况下是100%的通过率。 学制较短。政府大学一般的授课型硕士学制为2年,多数中

2021-05-28 20:20

加载中...