oracle的字符集检查工具CSSCAN(一)

使用CSSCAN 工具在源库检查数据。
指定扫描的schame,只能指定一个schame
$ csscan \"sys/@ as sysdba\" LOG=/tmp/expcheck USER=scott CAPTURE=Y TOCHAR=AL32UTF8 ARRAY=1024000 PROCESS=3
指定扫描的表,在unix环境下当指定多个表时,' 是必须的,否则会报错:0403-057 Syntax error at line 21 : `(' is not expected. ):
$ csscan \"sys/@ as sysdba\" LOG=/tmp/expcheck TABLE='(SCOTT.DEPT,SCOTT.EMP)' CAPTURE=Y TOCHAR=AL32UTF8 ARRAY=1024000 PROCESS=2
在windows环境下,' 可以省略!
c:\>csscan \"sys/@ as sysdba\" LOG=c:\temp\expcheck TABLE=(SCOTT.DEPT,SCOTT.EMP) CAPTURE=Y TOCHAR=AL32UTF8 ARRAY=1024000 PROCESS=2
如果你有很多表要扫描,或许要避免不同os环境下语法的麻烦。就可以使用参数文件。
Example contents of csscan.par:
LOG=/tmp/expcheck
TABLE=(SCOTT.DEPT,SCOTT.EMP)
FULL=N
CAPTURE=Y
TOCHAR=AL32UTF8
  PROCESS=6
ARRAY=1024000

使用参数文件执行:
$ csscan \"sys/@ as sysdba\" PARFILE=/tmp/csscan.par
or
C:\>csscan \"sys/@ as sysdba\" PARFILE=c:\temp\csscan.par

Csscan will create 3 files :
expcheck.out csscan 输出的日志 a log of the output of csscan
expcheck.txt 数据库扫描概要the Database Scan Summary Report
expcheck.err 扫描时出错的记录,包含了出错的表,字段 (contains the rowid's of the Convertible , Truncation and Lossy rows reported in expcheck.txt)
下面是一些注意点:
You can only specify on user at the time, if you are exporting the majority of a database then use FULL=Y
$ csscan \"sys/@ as sysdba\" FULL=Y TOCHAR=AL32UTF8 LOG=expcheck CAPTURE=Y ARRAY=1000000 PROCESS=2
* Always run Csscan connecting with a 'sysdba' connection/user, do not use the "system" or "csmig" user.
* The PROCESS= parameter influences the load on your system, the higher this is (6 or 8 for example) the faster Csscan will be done, the lower this is the less impact it will have on your system. Adapt if needed.
* Do not specify the TONCHAR or FROMNCHAR csscan parameters , those are to change the NLS_NCHAR_CHARACTERSET. Again they are not needed and should not be specified.
* The csscan SUPPRESS parameter limits the size of the .err file by limiting the amount of information logged / table. Using SUPPRESS=1000 will log max 1000 rows for each table in the .err file. It will not affect the information in the .txt file. It WILL affect the data logged in the .err file. This is mainly useful for the first scan of big databases, if you have no idea how much "Convertible" or "Lossy" there is in a database then this will avoid that the .err file becomes 100's of MB big and it limits also the space used by the csscan tables under the Csmig schema.
note that to have correct result of the following select you should NOT use the Csscan  SUPPRESS option.

Once Csscan has been run you then need to check the .txt file

If there is any "Lossy" data - this is data that CANNOT be converted to the new NLS_CHARACTERSET
- you need to check further and see why this data is "Lossy", discussing "Lossy" is outside the scope of this note。
This select will give all the lossy objects found in the last Cssan run:
conn / AS sysdba
SELECT DISTINCT z.owner_name
|| '.'
|| z.table_name
|| '('
|| z.column_name
|| ') - '
|| z.column_type
|| ' ' LossyColumns
FROM csmig.csmv$errors z
WHERE z.error_type ='DATA_LOSS'
ORDER BY LossyColumns
/

For solving the ORA-01401 / ORA-12899 error you need to know which columns are logged as "Truncation" and what the new size of the data will be after import.

You can find that in the expcheck.err file as "Max Post Conversion Data Size"
For example, check in the expcheck.txt file wich table has "Truncation", let's assume you have there a row that say's:

-- snip from expcheck.txt
[Distribution of Convertible, Truncated and Lossy Data by Table]

USER.TABLE Convertible Truncation Lossy
--------------------- ---------------- ---------------- ----------------
...
SCOTT.TESTUTF8 69 6 0
...

then look in the expcheck.err file for "TESTUTF8" until the "Max Post Conversion Data Size" is bigger then the column size for that table.

-- snip from expcheck.err
User : SCOTT
Table : TESTUTF8
Column: ITEM_NAME
Type : VARCHAR2(80)
Number of Exceptions : 6
Max Post Conversion Data Size: 81
the max size after going to AL32UTF8 will be 81 bytes for this column.

Or you can use this select to have a list of the columns that have "Truncation" and the new size in bytes that is minimally needed:

conn / AS sysdba
SET serveroutput ON
DECLARE
newmaxsz NUMBER;
BEGIN
FOR rec IN
( SELECT DISTINCT u.owner_name,
u.table_name,
u.column_name ,
u.column_type,
u.owner_id,
u.table_id,
u.column_id,
u.column_intid
FROM csmv$errors u
WHERE u.error_type='EXCEED_SIZE'
ORDER BY u.owner_name,
u.table_name,
u.column_name
)
LOOP
SELECT MAX(cnvsize)
INTO newmaxsz
FROM csm$errors
WHERE usr# =rec.owner_id
AND obj# =rec.table_id
AND col# =rec.column_id
AND intcol#=rec.column_intid;

DBMS_OUTPUT.PUT_LINE(rec.owner_name ||'.'|| rec.table_name||' ('|| rec.column_name ||') - '|| rec.column_type ||' - '|| newmaxsz || ' Bytes');
END LOOP;
END;
/       

时间: 2016-04-15

oracle的字符集检查工具CSSCAN(一)的相关文章

oracle的字符集检查工具CSSCAN(三)

oracle 提供了两个关于字符集检查的工具,一个csscan,一个是lcsscan(Language and Character Set File Scanner) 一 Database Character Set Scanner(csscan) 刚开始使用csscan时会遇到 CSS-00107错误,CSS-00107: Character set migration utility schema not installed 原因可以从提示得到:未安装CSSCAN. 所以先安装csscan

oracle的字符集检查工具CSSCAN(二)

SCAN.TXT 文件的内容: Database Scan Summary Report Time Started  : 2011-03-14 12:22:16 Time Completed: 2011-03-14 12:22:59 Process ID         Time Started       Time Completed ---------- -------------------- --------------------          1  2011-03-14 12:2

oracle字符集转换工具DMU使用

关于oracle字符集的介绍参考:http://blog.csdn.net/cymm_liu/article/details/7605081 (此博客也包含了一种修改字符集的方法,但是不建议使用) 今天我们为了解决这个问题:http://blog.csdn.net/cymm_liu/article/details/12013243 采用oracle 的字符集转换工具DMU来进行操作.. 1.DMU简介 参考:The Database Migration Assistant for Unicode

全面认识Oracle数据库字符集

    什么是Oracle字符集    Oracle字符集是一个字节数据解释的符号集合,有大小之分,有相互的包容关系.    Oracle支持国家语言的体系结构允许你使用本地化语言来存储,处理,检索数据.它使数据库工具,错误消息,排序次序,日期,时间,货币,数字和日历自动适应本地化语言和平台.    影响oracle数据库字符集最重要的参数是NLS_LANG参数.它的格式如下:    NLS_LANG = language_territory.charset    它有三个组成部分(语言.地域和

Oracle数据库字符集问题总结

在不同数据库做数据迁移.同其它系统交换数据等,常常因为字符集不同而导致迁移失败或数据库内数据变成乱码.现在我将oracle字符集相关的一些知识做个简单总结 一.什么是oracle字符集 Oracle字符集是一个字节数据的解释的符号集合,有大小之分,有相互的包容关系.ORACLE 支持国家语言的体系结构允许你使用本地化语言来存储,处理,检索数据.它使数据库工具,错误消息,排序次序,日期,时间,货币,数字,和日历自动适应本地化语言和平台. 影响oracle数据库字符集最重要的参数是NLS_LANG参

oracle 数据库字符集研究 中篇

四.EXP/IMP 与 字符集 4.1 EXP/IMP    Export 和 Import 是一对读写Oracle数据的工具.Export 将 Oracle 数据库中的数据输出到操作系统文件中, Import 把这些文件中的数据读到Oracle 数据库中,由于使用exp/imp进行数据迁移时,数据从源数据库到目标数据库的过程中有四个环节涉及到字符集,如果这四个环节的字符集不一致,将会发生字符集转换. EXP     ____________ _________________ ________

Oracle 8i字符集乱码问题析及其解决办法_oracle

正在看的ORACLE教程是:Oracle 8i字符集乱码问题析及其解决办法.一.问题描述 SQL Plus WorkSheet是一个窗口图形界面的SQL语句编辑器,对于那些喜欢窗口界面而不喜欢字符界面的用户,该工具相对SQL/PLUS受到了很大的欢迎.但从Oracle 8i以后,如果安装Oracle 8i时选取的是别于英语的字符集,对于我们中国,通常会选取简体中文字符集(ZHS16GBK),安装成功后,运行SQL Plus WorkSheet程序,会出现所有的中文显示以及查询结果均为乱码的情况.

Oracle数据库字符集转换规律全面剖析

作为一个Oracle数据库的用户,对于Export和Import两个命令绝对不会感到陌生,因为这二者正是我们经常用于数据备份和恢复的工具.但在使用这两个命令过程中所发生的Oracle字符集问题,常给一些Oracle使用者带来不必要的麻烦和不必要的数据损失.本文将就Export和Import过程中Oracle字符集的转换规律及使用这两个命令的注意事项做一总结. 字符集转换的原因 Export.Import过程如上图所示,从这个示意图中可以看到有四处关系到字符集,而这四处字符集的不一致恰恰是导致Or

在Oracle中使用DBVERIFY工具

Oracle 数据库运行过程中由于硬件故障或操作系统故障导致导致Oracle无法以Oracle格式来识别或所包含的内容即为出现数据块损坏 故障,这个坏块可以分为介质损坏以及逻辑损坏.下面给出了块的检查,以及使用DBVERIFY 工具实施块检查. 一.块检查 1.何时检查块 当一个数据块被读或写的时候,将对块的进行一致性检查,检查的内容包括 块的版本 比较块在cache与block buffer中的数据块地址 根据要求进行校验(checksum) 2.损坏的数据块的错误提示 可以从告警日志文件中找