Notes on converting scalar values (codepoints) to surrogates


Calculaters

[From the unicode mailing list. Email message by John Cowan]

If you have access to any Windows box, you can use the Windows Calculator (Start/Programs/Accessories/Calculator). Choose View/Scientific and click on the Hex radio button. Then enter your 5-digit Unicode scalar value. (You must type hex digits in lower case.) To get the high surrogate, type:

  - 1 0 0 0 0 = / 4 0 0 + d 8 0 0 =

To get the low surrogate, enter the scalar value again and type:

  - 1 0 0 0 0 = % 4 0 0 + d 8 0 0 =

You can also use the mouse, in which case "%" above represents the MOD key.

On *ix systems, use the "bc" command; type "obase=16" and "ibase=16". For this program, you must use capital letters for the hex digits. To get the high surrogate, type "(xxxxx-10000)/400+DC00" for the high surrogate ("xxxxx" is the scalar value); to get the low surrogate, type "(xxxxx-10000)%400+DC00".


Perl script

[From the unicode mailing list Email message by Jungshik Shin wrote:]

It seems to me a waste of the bandwidth (however abundant it may have become recently. I heard several times on this list that it's not in a certain country in Europe ;-) ) to go all the way across the Atlantic or the continent to convert between UCVs and surrogate pairs. There are several ways to do it locally including two suggested above. On *nix including MacOS X (http://developer.apple.com/internet/macosx/perl.html), one can open up a small terminal window (yes, Mac OS X has a terminal window !) and run a script like the following(assuming Perl is installed. If GUI is desired, make one up in Perl/Tk, Tcl/Tk, pdksh, Python+Tk?...) This should also work in a command prompt of Windows. Alternatively, I guess a local html file with ECMAscript should also work.

------------Cut--------here----------------
#!/usr/bin/perl -w
# use the full path of your perl binary in place of /usr/bin/perl

while ( 1 ) {
  print "** Enter Unicode code point in hexadecimal \n" .
        "  (to end, press [enter]) : ";
  $| = 1;               # force a flush after our print
  $ucs = ;
  chomp $ucs;

  last if $ucs eq "";

  if ( $ucs =~ /[^a-f0-9A-F]/ ) {
    printf "  Error: %s is invalid. Try again\n", $ucs;
    next;
  }

  $usv = hex $ucs;
  if ( 0xffff < $usv && $usv < 0x110000 ) {
    printf "UTF-16: %04x %04x\n", ($usv-0x10000) / 0x400 + 0xd800,
                                  ($usv-0x10000) % 0x400 + 0xdc00,
  }
  elsif ( $usv < 0xd800 || 0xdfff < $usv && $usv < 0x10000 ) {
    printf "UTF-16: %04x\n", $usv;
  }
  else {
    printf "Your input %s is not valid. Try again\n", $ucs;
  }
}

print "Bye !!\n";
--------------------Cut---------here--------------


Tools

< Back