Notes on converting scalar values (codepoints) to surrogates
Calculaters
[From the unicode mailing list. Email message by John Cowan]
If you have access to any Windows box, you can use the Windows Calculator (Start/Programs/Accessories/Calculator). Choose View/Scientific and click on the Hex radio button. Then enter your 5-digit Unicode scalar value. (You must type hex digits in lower case.) To get the high surrogate, type:
- 1 0 0 0 0 = / 4 0 0 + d 8 0 0 =
To get the low surrogate, enter the scalar value again and type:
- 1 0 0 0 0 = % 4 0 0 + d 8 0 0 =
You can also use the mouse, in which case "%" above represents the MOD key.
On *ix systems, use the "bc" command; type "obase=16" and "ibase=16". For this program, you must use capital letters for the hex digits. To get the high surrogate, type "(xxxxx-10000)/400+DC00" for the high surrogate ("xxxxx" is the scalar value); to get the low surrogate, type "(xxxxx-10000)%400+DC00".
Perl script
[From the unicode mailing list Email message by Jungshik Shin wrote:]
It seems to me a waste of the bandwidth (however abundant it may have become recently. I heard several times on this list that it's not in a certain country in Europe ;-) ) to go all the way across the Atlantic or the continent to convert between UCVs and surrogate pairs. There are several ways to do it locally including two suggested above. On *nix including MacOS X (http://developer.apple.com/internet/macosx/perl.html), one can open up a small terminal window (yes, Mac OS X has a terminal window !) and run a script like the following(assuming Perl is installed. If GUI is desired, make one up in Perl/Tk, Tcl/Tk, pdksh, Python+Tk?...) This should also work in a command prompt of Windows. Alternatively, I guess a local html file with ECMAscript should also work.
------------Cut--------here----------------
#!/usr/bin/perl -w
# use the full path of your perl binary in place of /usr/bin/perl
while ( 1 ) {
print "** Enter Unicode code point in hexadecimal \n" .
" (to end, press [enter]) : ";
$| = 1; # force a flush after our print
$ucs = ;
chomp $ucs;
last if $ucs eq "";
if ( $ucs =~ /[^a-f0-9A-F]/ ) {
printf " Error: %s is invalid. Try again\n", $ucs;
next;
}
$usv = hex $ucs;
if ( 0xffff < $usv && $usv < 0x110000 ) {
printf "UTF-16: %04x %04x\n", ($usv-0x10000) / 0x400 + 0xd800,
($usv-0x10000) % 0x400 + 0xdc00,
}
elsif ( $usv < 0xd800 || 0xdfff < $usv && $usv < 0x10000 ) {
printf "UTF-16: %04x\n", $usv;
}
else {
printf "Your input %s is not valid. Try again\n", $ucs;
}
}
print "Bye !!\n";
--------------------Cut---------here--------------
