How are oracle character data stored? in bytes or characters? complete reference for the new oracle dba





you are a new oracle dba . you know that the columns in a table can be number or character or date or some other data types.
you want to know how the character data is stored in the database. you want to know how does oracle handle the column storage when you define a column A for example as varchar2(20) .. does the database store it as 20 bytes or does the database assign storage equivalent to 20 characters?

The quick answer is that you have a parameter called nls_length_semantics which many DBA’s define in the initialization parameter file and this parameter has a default value as ‘BYTE’..so when you define a column as varchar2(20) then oracle allocates 20 bytes.
you should know that there are different types of characters
ascii characters (occupies 1 byte for each character)
latin characters(occupies 2 bytes for each character)
asian characters(occupies 3 bytes for each character)

you should also know that a database can have a single byte character set or a multi byte character set . search on this blog for character set and you will find a detailed explanation about everything related to character sets.

for now read the below article which explains in detail about the nls_length_semantics parameter as explained briefly above.

Length Semantics

In single-byte character sets, the number of bytes and the number of characters in a string are the same. In multibyte character sets, a character or code point consists of one or more bytes. Calculating the number of characters based on byte lengths can be difficult in a variable-width character set. Calculating column lengths in bytes is called byte semantics, while measuring column lengths in characters is called character semantics.

Character semantics were introduced in Oracle9i. Character semantics is useful for defining the storage requirements for multibyte strings of varying widths. For example, in a Unicode database (AL32UTF8), suppose that you need to define a VARCHAR2 column that can store up to five Chinese characters together with five English characters. Using byte semantics, this column requires 15 bytes for the Chinese characters, which are three bytes long, and 5 bytes for the English characters, which are one byte long, for a total of 20 bytes. Using character semantics, the column requires 10 characters.

The following expressions use byte semantics:

  • VARCHAR2(20 BYTE)
  • SUBSTRB(string, 1, 20)

Note the BYTE qualifier in the VARCHAR2 expression and the B suffix in the SQL function name.

The following expressions use character semantics:

  • VARCHAR2(10 CHAR)
  • SUBSTR(string, 1, 10)

Note the CHAR qualifier in the VARCHAR2 expression.

The NLS_LENGTH_SEMANTICS initialization parameter determines whether a new column of character datatype uses byte or character semantics. The default value of the parameter is BYTE. The BYTE and CHAR qualifiers shown in the VARCHAR2 definitions should be avoided when possible because they lead to mixed-semantics databases. Instead, set NLS_LENGTH_SEMANTICS in the initialization parameter file and define column datatypes to use the default semantics based on the value of NLS_LENGTH_SEMANTICS.

Byte semantics is the default for the database character set. Character length semantics is the default and the only allowable kind of length semantics for NCHAR datatypes. The user cannot specify the CHAR or BYTE qualifier for NCHAR definitions.

Consider the following example:

CREATE TABLE employees
( employee_id NUMBER(4)
, last_name NVARCHAR2(10)
, job_id NVARCHAR2(9)
, manager_id NUMBER(4)
, hire_date DATE
, salary NUMBER(7,2)
, department_id NUMBER(2)
) ;

When the NCHAR character set is AL16UTF16, last_name can hold up to 10 Unicode code points. When the NCHAR character set is AL16UTF16, last_name can hold up to 20 bytes.

Figure 2-2 shows the number of bytes needed to store different kinds of characters in the UTF-8 character set. The ASCII characters requires one byte, the Latin and Greek characters require two bytes, the Asian character requires three bytes, and the supplementary character requires four bytes of storage.

Figure 2-2 Bytes of Storage for Different Kinds of Characters
Description of the illustration nlspg032.gif

CHAR Column Length Semantics

If you plan to use CHAR column length semantics in a replication database after you upgrade it to Oracle Database 10g, then all of the databases participating with that database in the replication environment must also use CHAR column length semantics. In this case, Oracle recommends that you upgrade all of the databases participating in the replication environment at the same time. This applies to both master sites and materialized view sites in your replication environment.

If you cannot upgrade all of the databases in your replication environment at the same time, then you can only use CHAR column length semantics in your Oracle Database if all of the databases prior to Oracle9i are using a single-byte character set. Otherwise, do not switch to CHAR column length semantics in the Oracle Database until all of the other databases in the replication environment are upgraded to Oracle Database 10g.

When using a multibyte database character encoding scheme, consider carefully the space required for tables with character columns. If the database character encoding scheme is single-byte, then the number of bytes and the number of characters in a column is the same. If it is multibyte, then there generally is no such correspondence. A character might consist of one or more bytes, depending upon the specific multibyte encoding scheme and whether shift-in/shift-out control codes are present. To avoid overflowing buffers, specify data as NCHAR or NVARCHAR2 if it might use a Unicode encoding that is different from the database character set.

NLS_LENGTH_SEMANTICS

Property Description
Parameter type String
Parameter scope Environment variable, initialization parameter, ALTER SESSION, and ALTER SYSTEM
Default value BYTE
Range of values BYTE or CHAR

By default, the character datatypes CHAR and VARCHAR2 are specified in bytes, not characters. Hence, the specification CHAR(20) in a table definition allows 20 bytes for storing character data.

This works well if the database character set uses a single-byte character encoding scheme because the number of characters is the same as the number of bytes. If the database character set uses a multibyte character encoding scheme, then the number of bytes no longer equals the number of characters because a character can consist of one or more bytes. Thus, column widths must be chosen with care to allow for the maximum possible number of bytes for a given number of characters. You can overcome this problem by switching to character semantics when defining the column size.

NLS_LENGTH_SEMANTICS enables you to create CHAR, VARCHAR2, and LONG columns using either byte or character length semantics. NCHAR, NVARCHAR2, CLOB, and NCLOB columns are always character-based. Existing columns are not affected.

You may be required to use byte semantics in order to maintain compatibility with existing applications.

NLS_LENGTH_SEMANTICS does not apply to tables in SYS and SYSTEM. The data dictionary always uses byte semantics.

Note that if the NLS_LENGTH_SEMANTICS environment variable is not set on the client, then the client session defaults to the value for NLS_LENGTH_SEMANTICS on the database server. This enables all client sessions on the network to have the same NLS_LENGTH_SEMANTICS behavior. Setting the environment variable on an individual client enables the server initialization parameter to be overridden for that client.

Maximum Column Number and Row Length for an IOT without Overflow Segment ORA-01429 [ID 393914.1]  

 
  Modified 03-OCT-2006     Type HOWTO     Status MODERATED  
       

In this Document
  Goal
  Solution
  References


This document is being delivered to you via Oracle Support’s Rapid Visibility (RaV) process, and therefore has not been subject to an independent technical review.

Applies to:

Oracle Server – Enterprise Edition – Version: 9.2.0.1 to 10.2.0.2
Information in this document applies to any platform.

Goal

What are the designed limits for an IOT table with regards to the number of columns and maximum
row length?

Solution

An IOT row is made of an index rowpiece directly followed by a table rowpiece for the non key
columns. Chaining from the non key colums requires an overflow segment.
Oracle can handle 255 columns in a single row piece, as such, an IOT can have in the index segment
a maximum number of columns equal to the primary key columns + 255 others.
Any extra column requires chaining into an overflow segment. This is because Oracle uses rowid
like constructs for chaining and as such a table like structure to chain into is needed.

The maximum length of an IOT record, in order to fit into the index block, is limited to half the
block size.

Author: admin