Monday, August 24, 2009

Characterset Woes

Ever create a database with a characterset only to find out later the application requires something different. OK, so now what, recreate the database? Change the character set?
Changing the characterset is definitely an option but there are some hoops to go through to make this happen. Depending on when it is discovered that a different characterset is needed recreating the database is a valid option.
So, since there are issues and things to work through with charactersets, lets go through some basic discussions to have to decide what characterset to use first. With international databases and several platforms offering national characterset datatypes, there are several combinations and charactersets to choose from. I was of the mindset to just use the current UTF8 version and then set the varchars big enoug to handle any language that comes its way. Now this might work for an application where there is discussion about the datatypes and control over the code with the developers, but for reporting and other applications sitting on top of the database this might not be the best approach. Make sure to double check and maybe even ask again with the vendor to know which database characterset and which national characterset is needed. Also, when looking at what characterset to use the Oracle Globalization documentation does provide some helpful hints as well as thinking about supersets in planning if you have to change.
With great planning or possibly needing to use an existing database, a characterset change might still be needed. There are several good notes out there and tricks on how to do this, but I thought I would add my quick checklist to here to help out where possible, since I just went through this pain. In my case I have existing databases that now the NCHARs and NVARCHARs will be used and the vendor has a specific national characterset that is needed.
I decided that I didn't want to recreate the database and do an export and import to switch over, but checked to make sure that NCHARs, NVARCHARs and NBLOBs (NCLOBs) etc. are not being used currently. So there are no values here from a user perspective but might be some in the system tables. If there were any N-values then export these tables and truncate them. It is not a problem to have them in the database but if there are values populating these columns are the problem. The characterset that was needed is a strict superset of the current characterset and again the Oracle documentation will provide a list of which charactersets can change to others.
Now it appears that it is just a quick alter database national character set NEW_CHARACTERSET, right? Probably not. Additional checks are needed. Also, assumptions here are being made, that a spfile is being used, RAC clusters are altered to single instance mode to change the characterset and the checks of the data types being supported in the new characterset has been completed.
XBD tables use N-data, and this can be truncated if under 7 rows are in the tables xdb.XDB$QNAME_ID and xdb.XDB$NMSPC_ID (open a case with Oracle with more than 7). These are the tables that caused me a lot of headache because I kept getting the ORA-12717: Cannot issue ALTER DATABASE NATIONAL CHARACTER SET when NCLOB, NCHAR
or NVARCHAR2 data exists, and wasn't sure where it was coming from.
After dealing with this data, run the csscan FULL=Y TONCHAR=UTF8 LOGcheck CAPTURE=Y ARRAY=1000000 PROCESS=2 as sysdba.
Shutdown the database and startup in restrict mode. Other parameters that need to be set job_queue_processes=0 and aq_tm_processes=0, then ALTER DATABASE NATIONAL CHARACTER SET NEW_CHARACTERSET, run $ORACLE_HOME/rdbms/admin/csalter.plb.
New character should now be set, and then the next steps are just to put things back the things that were changed to make this happen.
Set job_queue_processes and aq_tm_processes back to the original values, and then shutdown and startup. Don't forget about the data in the XDB tables which can be inserted from $ORACLE_HOME/rdbms/admin/catxdbtm.sql.
Are you now understanding why I started this off with choose your characterset wisely? There are several steps that are needed for the change as well as knowing that the database is able to change over and data is either not there or able to export to make it happen. These are just some of the highlights that I ran into going through these steps which will hopefully help someone out with their next characterset change.

2 comments: