Database Character Conversion with Oracle CSCAN utility

Steps to change the Character Set of Oracle Database

1. Take full backup of database.

2. Connect the database with sysdba user.

3. Check the current character set of database.

SELECT value FROM NLS_DATABASE_PARAMETERS WHERE parameter='NLS_CHARACTERSET';

4. Setup the CSCAN utilit with csminst.sql present in RDBMS Folder.

SQL> @$ORACLE_HOME/rdbms/admin/csminst.sql;

5. Set the password of CSmIG : csmig

6. Then installation completed.

7. SET the LD_LIBRARY PATH.

export LD_LIBRARY_PATH=$ORACLE_HOME/lib;

8. Run the CSCAN utility:

csscan  FROMCHAR=WE8ISO8859P1 TOCHAR=AL32UTF8 LOG=csscan CAPTURE=N           PROCESS=1 ARRAY=1024000
-- it will ask for SYS username and password
SELECT DISTINCT z.owner_name|| '.'|| z.table_name|| '('|| z.column_name|| ') - '|| z.column_type|| ' ' LossyColumns
FROM csmig.csmv$errors z
WHERE z.error_type ='DATA_LOSS'
ORDER BY LossyColumns;

9. Three files are generated by this command:

10. Check the csscan.txt file, it will provide the table list.

General Steps:
– convert to its “strict” superset using CSALTER If have “truncation”
– changing length semantics from BYTE to CHAR on database level or column level
If have “convertible”
– full export/import
– or partial export/import with CSALTER

Note: Must have 3 lines in scan.txt in order to run CSALTER
–All character type data in the data dictionary remain the same in the new character set.
–All character type application data remain the same in the new character set
–The data dictionary can be safely migrated using the CSALTER script

Dealing with Lossy Data:

strict superset – if and only if each and every character in the source character set is available in the target character set, with the same corresponding character value.

Run CSSCAN with from_char / to_char both to “strict” superset
csscan full=Y fromchar=WE8MSWIN1252 tochar=WE8MSWIN1252

Handle lossy Data:
Convert WE8ISO8859P1 to WE8MSWIN1252 using CSALTER

1. Shutdown listener and connected applications

2. Shutdown db and startup restrict

shutdown immediate
startup restrict

3. Run the csalter command:

SQL>@$ORACLE_HOME/rdbms/admin/csalter.plb

4. Query nls_database_parameters to verify new character set

Dealing with Truncation Data:
Data resulting from conversion does not fit within the column’s maximum length
Truncation data is always also Convertible data, which means that whatever way you do the change, these rows have to be exported before the character set is changed and re-imported after the character set has changed. If you proceed with that without dealing with the truncation issue then the import will fail on these columns with “ORA-01401: inserted value too large for column” (or from 10g onwards “ORA-12899: value too large for column…” ) because the size of the data exceeds the maximum size of the column
Two ways:
1. Before export shorten the data so it fit in column
2. Adapt the columns to fit the expansion of the data
e.g. CHAR column to VARCHAR2 before export on the source

How to handle “truncation”?

1. Enlarge column using more bytes:

Eg: alter table MONEY modify (SYMBOL CHAR(3));

2. Change length semantics to CHAR

--at database level
alter system SET nls_length_semantics=CHAR;
--at session level
alter session SET nls_length_semantics=CHAR;
--change column from BYTE to CHAR
alter table MONEY modify (SYMBOL CHAR(1 CHAR));

 
Dealing with “Convertible” data:
When using export/import using the “old” Exp/Imp tools the NLS_LANG setting is simply AMERICAN_AMERICA..”Convertible” data needs to be exported and truncated/deleted

How to handle it?

1. Full export/import

2. Partial export/import with CSALTER

Dealing with LOSSY Data

If there is any “Lossy” data then those rows contain code points that are not currently defined correctly and they should be cleared up before you can continue. If this “Lossy” is not checked/corrected then this “Lossy” data WILL BE LOST.
conn / AS sysdba
It will give the lossy data

11 . Set the character set for export the tables then export the data
NOTE: check the relations ship between tables and mark the count of data present in tables

export NLS_LANG=AMERICAN_AMERICA.AL32UTF8
exp \"sys/sys123@orcl1 as sysdba\" tables=OE.INVENTORIES,OE.WAREHOUSES file=table_data_1.dmp log=table_data_1.log
exp system/sys123 parfile=exppar.dat

12. Delete the data from listed tables.

13. Run again CSSCAN script for verify the data is ok.

csscan  FROMCHAR=WE8ISO8859P1 TOCHAR=AL32UTF8 LOG=csscan CAPTURE=N PROCESS=1 ARRAY=1024000

14. Check the file csscan.txt

15. Shutdown the database.

shutdown immediate;

16. Startup the database in restricted mode.

startup restrict

17. Run the script for convert the database.

18. Shutdown and Startup database normally

shutdown immediatek;
startup;

19. Check the database character set  again. Now its changed.

SELECT value FROM NLS_DATABASE_PARAMETERS WHERE parameter='NLS_CHARACTERSET';

20. Import the data with imp command.

export NLS_LANG=AMERICAN_AMERICA.AL32UTF8
imp system/sys123  full=y file=exportnew1.dmp log=impdata.log rows=y ignore=y

 

Example through EXP/IMP:

1. Create a new instance Unicode database (AL32UTF8)

2. Set NLS_LANG to source character set and run full export

NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252
exp FULL=Y…

3. Set NLS_LANG to source character set and run full import
NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252
imp FULL=Y …
 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.