Nick Hodges

The Unicode Shift

24 Mar

We’ve been hard at work on our Unicode product called "Tiburon".  Things are going really, really well.

Internally, the process has been quite smooth.  The process went like this:

  1. The first step was to get the compiler working.  First, we established a new type “UnicodeString” which works pretty much like AnsiString does.   It knows how to handle Unicode string types, and carries along with it information about the type of string "payload" that it is carrying.  By default, it’s payload is UTF-16.  Note that this new type is an addition to the set of string types.  All existing string types are still there and still work like they always did.
  2. Next, we updated the RTL to use the new UnicodeString as the default string type. That is, we set string = UnicodeString, Char = WideChar  and  PChar = PWideChar.  We then added classes and functions that will allow you to have greater control over the encoding of text for I/O operations, and other data storage and conversion needs.The result was a complete RTL that knows how to deal with Unicode data without you having to do much of anything to your existing code.
  3. Once that was complete, we applied the new RTL to the VCL.  This proved to be a very easy step, as the VCL was already basically "Unicode-ready".  The VCL is single-sourced, and since VCL.NET is already "Unicode-ified", it was a pretty straight-forward process to get the VCL up and running with the new RTL and compiler.  The result was a VCL that is completely Unicode enabled.
  4. Finally, once the VCL was updated, we started compiling the IDE with the new VCL, RTL, and compiler.  Naturally, the IDE itself needs to be built with the same VCL that you all will use, thus enabling you to install fully Unicode-enabled components and experts into the IDE.  We run and use the same VCL you do, so we’ve got a fully Unicode-ified IDE up and running.  This process was also remarkably easy — the new RTL and VCL work so similarly to the previous version that getting the IDE working was a much smoother process than even we had anticipated.  Because we’ve done all the heavy lifting in the RTL, large swaths of our existing code is working just fine as is, and yours will too.

We only in the early pre-Alpha stages, but as a result, we already have:

  1. A new compiler with new inherent UnicodeString type
  2. A new RTL using new string type routines (i.e. many new compiler helper functions in System.pas).
  3. An updated VCL using the new RTL that allows the string type to easily "float" to the UnicodeString type
  4. A fully functional IDE compiling with the new Unicode Compiler, RTL, and VCL.

As part of the process, we had a partner summit here this past week. We invited some of our key technology partners out to look at early versions of Tiburon, and they ran their code against it.  They all were able to make great progress towards converting their code in just the one day they were here.  All of them reported that the porting process was quite easy, and they were very pleased with how it all went.  A number of them had their entire code based transitioned to the new compiler and RTL and had their components up and running within a few hours.  All in all, it was quite painless for the guys who build components and products for Delphi.  Everyone was quite pleased at the end of the day. We were happy because we got to see some of the fruits of our labor, and the partners were happy because they rather easily had much of their code up and running very quickly.   As a result, we are more confident than ever that we are on the right track and that the vast majority your existing code will make the leap to a Unicode product just fine.

Our Chief Scientist Allen Bauer has revealed a little bit of the specifics of what we are up to, so if you need to know more about the bits and bytes side of things, give him a read. But from the Product Manager side of things, we are ahead of schedule and moving along nicely.  We’ll continue to roll out information as it becomes available, but you can rest assured that your existing code base, to a very, very large degree, is safe, and will compile with Tiburon.  There will be certain code idioms that will need to be checked and reviewed, and we’ll be letting you know about those things in the coming weeks. The cool part is that many of the changes you can do now with the existing compiler, and such changes, minor as they are, will give you a more robust code base.  We are working hard to ensure that the migration to Tiburon is as smooth as possible, and our process (which, believe me, gives the compiler, the RTL, and the VCL a really good workout; there are advantages to building Delphi in Delphi) has shown us that our work will reap rewards for your code base.

76 Responses to “The Unicode Shift”

  1. 1
    Ferruh Koroglu Says:

    Good Work,

    With Best Regards,
    Ferruh Koroglu.
    http://www.teksdata.com/fkoroglu

  2. 2
    Paul Says:

    Good news Nick. I’m assuming you’ll be tight lipped, but can I tempt you into commenting on how this smooth "Tiburon" work may affect the release scheulde for "Commodore" ? 64bit is our holy grail I’m afraid.

  3. 3
    Nick Hodges Says:

    Paul –

    Commodore will be "Middle of 2009"

    ;-)

    Nick

  4. 4
    C Johnson Says:

    I look forward to seeing the process unfold.

  5. 5
    Roddy Says:

    Great to hear it, but we’re desperately short of news on how this will work for C++Builder users, where things may be a little more fraught. Allen hinted at a blog post, and Alastair seems to be have been as silent as the grave. Any chance you could gently cattle-prod the relevant people?

  6. 6
    Troy Wolbrink Says:

    Very cool! Can’t wait to get my hands on it! Any news on the .NET side of things?

  7. 7
    Jolyon Smith Says:

    So you took a codebase already prepared for Unicode and found it "pretty straightforward". It should surely have been VERY straightforward.

    I suspect that many of your partners at that summit are also working on essentially Unicode-ready codebases, given that CodeGear is famously "behind the curve" in this particular area.

    More interesting would be to learn how things will go for extensive codebases that are NOT prep’ed for Unicode at all and which contain assumptions about "String".

    e.g. Here’s something I ran into just last week. How will the new compiler deal with:

    TFoo = record
    Code: String[5];
    end;

    This cannot be "ANSIfied" since the compiler rejects:

    ANSIString[5];

    it also rejects:

    ShortString[5];

    So to "prepare this code for Unicode" today requires something like:

    TFooCode = array[1..5] of ANSIChar;

    TFoo = record
    Code: TFooCode;
    end;

    Except that now the Code member of TFoo is incompatible with other String type variables, so any code that something similar to:

    var s: String;

    s := someFoo.Code;

    Now breaks and things quickly become very messy indeed.

    So one question is, will "String[]" be a "Unicode short string" in Tiburon or will "String[]" still mean "ANSI short string" in this one specific usage?

    If so, this is the kind of inconsistency (unexpectedly?) falling out of an implementation that ime points to deeper problems perhaps yet to be uncovered.

    Jon: "Hey Bob, is the Code of a TFoo Unicode?"

    Bob: "Um, I guess, it’s a String and String is Unicode in Delphi, right?"

    Jon: "I don’t get it then - I’ve got a user entered CJK Code here that is screwing things up"

    Unless of course in adding support for the new Unicode strings the compiler is also "fixed" to allow:

    ShortString[5];
    or ANSIString[5];

    Which would make things clearer at least.

    But even if so, that will be a compiler "feature" not available until the Unicode support is also present, i.e. it won’t be possible to "prepare" this sort of code ahead of time.

    "you can rest assured that your existing code base, to a very, very large degree, is safe, and will compile with Tiburon."

    Being able to compile isn’t the whole problem, as you *should* know, and not even the bigger problem.

    Being sure that the compiled code will behave correctly (i.e. w.r.t as designed for an ANSI application) is the much bigger concern.

    That is the bigger problem because code that doesn’t compile is certain to NOT be released as executable production binaries. Code that compiles but that doesn’t behave as required on the other hand is a contract re-negotiation just waiting to happen.

    And *your* code (or indeed that of your partners) is not necessarily representative of the type of code "out there" in application land.

    Fudging things so that they compile is perhaps the very WORST approach in that respect. Much better to break compilation in areas that will require attention in order to preserve existing behaviour, so that those areas are certain to GET some attention.

    Enabling a "Compile it and hope for the best" approach will make for great demos and some very angry customers when THEIR customers start reporting problems and unexpected behaviours that CodeGear assured them wouldn’t occur.

    "There will be certain code idioms that will need to be checked and reviewed"

    You aint kidding.

  8. 8
    Nick Hodges Says:

    Jolyon –

    TFoo = record
    Code: String[5];
    end;

    will continue to work as it always had. Not to worry.

    Given that, I assume the rest of your concerns melt away like an ice cube on hot pavement…..

    Nick

  9. 9
    Ralf Says:

    Does it mean that String[5] is seen as a Short-**Ansi**-String?

  10. 10
    ahmoy Says:

    unicode? yey!!!

    btw is there any news on skinning VCL?

  11. 11
    Mick Says:

    Is UnicodeString exactly the same as WideString? There are many 3rd party components that are already Unicode enabled and use WideString. Will the Delphi compiler handle automatic translation between the two types (assuming they are different)?

  12. 12
    Paul Says:

    Nick, "Middle of 2009" for Commodore. I was hoping the "Winter 2008" on the roadmap might be improved on with the "Tiburon" good work. Are you close to a publishing a revised roadmap after the latest Delphi survey ?

    Without 64bit on the near horizon we’re likely to have to make very significant architectual changes to cope with a 3GB limit, probably AWE related. We’re already compromising with customers and I’m not looking forward to a significant development that gives us nothing in the long run. As I said, 64bit is our holy grail.

    Paul.

  13. 13
    Vitaliy Says:

    From that I can read it looks like Tiburon will be last ever Delphi version.
    It is breaking release, many existing components won’t be compatible with it, 64bit option is still not available.
    Almost all my code won’t be easealy portable to unicode (and very many guys don’t like to move to unicode at all, do you know that?).
    Codegear financial status is extremely dangerous. And it looks like no one in CodeGear read In Search of Stupidity ever. Otherwise you’ll never do this.

    I am pretty upset with Codegear and with your personally, Nick. You are not developers company. You are company of developers who don’t listen to moans of your shrinking user base. It is not the same things.

  14. 14
    Jolyon Smith Says:

    @Mick: No UnicodeString is NOT (read: will not be) exactly the same as WideString.

    WideString remains as a distinct type, and has to, as it’s implementation is very specifically aimed at the COM support in Delphi.

    UnicodeString is a wholly new string type. It will however comprise of WideChar’s - just as does a WideString.

    @Nick: No, quite predictably my concerns do not melt on the pavement. "Predictably" because this potential answer was one that I actually pointed out some problems with in the initial response.

    That, and of course "the rest of my concerns" dealt largely with the absurdly naive notion that "as long as it compiles, everything will be fine" that seems to underpin CG’s approach to this change.

    But as far as String[] goes, if you really mean what you say, and that what this compile to is a ShortString - as it exists today, i.e. ANSI, then as I pointed out, this points to a code smell that I referred to as itself perhaps indicative of deeper problems yet to be realised/uncovered.

    In Delphi today, "String" is always an ANSI string. It may be a long string or a short string, but it’s always ANSI.

    It had previously been offered that this consistency was being preserved (and very deliberately so), that ANSIString still existed for where it was needed, but that "String" would now be consistently "UnicodeString".

    But now you seem to be saying that "String" is UnicodeString except when it’s a ShortString, in which case "String" is an ANSI String.

    So

    Foo.Code := s

    where Code is "String[]" and s is just "String" now involves an implicit UTF16/ANSI transcode. Now presumably this will emit a warning.

    A warning that isn’t going to make much sense upon inspection of the code.

    "Fixing" the code to make the declaration tally (i.e. self documenting) with the compiled result (changing "String" to "ANSIString") won’t work if - as you seem to be saying - the compiler will not be changed to make such a self documenting declaration possible.

    So the smell from the Delphi compiler leaks into the application code.

    Nice.

    In fact, this was one of the reasons that Allen gave to NOT pursue any implementation that allowed for situations where a "String" declared in one context might not be the same fundamental type as a "String" declared in some other context.

    i.e. the exact situation this creates.

    So if the CG approach was deliberately and consciously chosen in order to eliminate such problems, one has to wonder at the fact that it quite clearly doesn’t, which then leads to the obvious question… do CG actually have *any* idea what they are doing and the impact it is going to have?

    It pains me to say but this hasn’t given me - at least - any greater confidence that this has been properly thought through (from YOUR customer’s p.o.v).

  15. 15
    C Johnson Says:

    Jolyon -> Since string[5] isn’t an ansi string now, but rather a short string, I suspect you will find that anyone still doing this ALREADY knows better than to assume string[5] is an ansistring. (why would anyone still be doing this again? Right, memory mapped IO on old files, nearly forgot, since I’d long since changed any loading/writing code that did that to something saner YEARS ago…)

    Actually, since you have to know a fair bit about delphi & pascal to even use a length limited string these days (you will find almost zero references to it anywhere), and everyone seemed to survive the transition from 255 byte strings with a length byte at the 0′th position back in the D2 days, I’m going to guess that this change isn’t going to be a huge problem for most.

    The only problems will be when you are doing special memory mapping operations, assuming that buffer of memory is compatible with a string of some sort. If you know enough to do that, chances are you’ll know enough to adapt.

    So, ya, chill dude - It’s coming whether you want it or not. If you figure you have a bunch of special case code, do the sane thing: start a private dialog with Nick and see if he will add you to the eventual beta cycle (when its ready). You get your advanced notice, you can advise them on your challenges and maybe help shape things.

    I gotta say, on a personal note, you seem to be stuck in a somewhat confrontational frame of mind. One that I shared in the Borland days. I have found that since the CodeGear era started that things have been a little mellower, you might want to take a deep breath and give it a fresh try. You might be surprised at the results.

  16. 16
    Vitaliy Says:

    I also do not understand your points why it is impossible to make checkbox in option with ability to turn all back to ansi strings.
    Make whole copy of RTL and VCL compiled libraries in both formats. You have about 1.5Gb of pure crap in distribution already, it won’t add much more, but will be very useful in many cases.

  17. 17
    Jolyon Smith Says:

    @ C Johnson:

    And what is a ShortString? CLUE: If it isn’t Unicode then by definition it’s ANSI. Sure, it’s not "ANSIString" but it is _an_ ANSI string.

    "Actually, since you have to know a fair bit about delphi & pascal to even use a length limited string these days (you will find almost zero references to it anywhere),"

    I love it when people say these sorts of things.

    You don’t need to know that much about Delphi and Pascal to use length limited strings. All you need to know is that they are supported.

    Knowing that length limited strings are supported, it makes perfect sense that if you have strings in your application that are limited in length, then a length limited string type is a pretty obvious choice for such data.

    But let’s deal with the idea that there will be zero references.

    Let’s remember that Nick himself held up CodeGear’s experience with the VCL as an example of how easy all this will be.

    One would have thought that if the use of such length limited stings were so obsolete in "the real world", that the highly rareified atmosphere of the VCL might be the very last place you would expect to find them, right?

    (Let’s set aside for a moment that whatever you might like to dream of in an ideal world, in the REAL world they absolutely are in real and active use in real and active applications, so any arguments suggesting we just ignore them as any source of potential problems/confusion is pretty ridiculous).

    But, back to the VCL.

    I suggest you do a Find In Files of the VCL source for "String[" then get back to me with how "obsolete" you think these things are.

  18. 18
    Jolyon Smith Says:

    Also @ C Johnson:

    Actually, it’s the very fact that I *DON’T* think I have special case code that makes me so frustrated and worried by the CG approach.

    I think in fact it’s the other way around. I think that anyone who has an ANSI app that can be happily and reliably made Unicode in the absurdly naive fashion being promoted by CG’s approach will be the special cases.

    And by definition there are for more EXISTING applications than there are new ones, and will be for a VERY long time.

    I’m concerned that creating obscure problems for TYPICAL Delphi applications will be the final straw that could see Delphi consigned to the history books for once and for all.

    I DON’T want that to happen.

    In the last two years, the focus has (allegedly) been on "restoring the faith". Bringing Delphi users "stuck" on older versions back into the fold and giving them reasons to get - and stay - current.

    The approach to introducing Unicode in Tiburon flies in the face of that objective.

    (And please remember, I am in that group that NEEDS Unicode support. It is not the fact that Unicode is coming that is the problem, it is the WAY it is being done, and there ARE - or perhaps increasingly "were" - alternatives).

  19. 19
    Esteban Pacheco Says:

    Good job guys, keep it up.

    Oh, and break stuff!, its about time. We have a strong Delphi 2007 to keep working while we move to the new Tiburon. :)

  20. 20
    C Johnson Says:

    Jolyon -> Actually, if you REALLY want to be picky, a short string is an ascii string.

    "All you need to know is that it is supported" Sure, true, but since its not referenced anywhere but the language guide which most people don’t use, and most examples use strings without lengths, chances are you aren’t going to know.

    As for your suggestion of searching for String[ in the VCL, I did find a few. And based on Nick’s discussion, I would expect them to continue acting exactly as they currently do now. Since that would not change how the code behaves, it would not cause migration problems. In fact, since it seems to mostly be used in structures and APIs that are not Unicode, this is pretty much exactly what anyone would want. Since the VCL interfaces APIs for us, I hardly find it surprising. Since they would not be changing, interfacing those APIs will continue to work correctly. If you hand it unicode data outside of its range, that data will continue to be modified in the way in which it would currently be modified. Hardly a surprise.

    As for in the real world, I find very few actually instances where a shortstring is required - most of the instances where I found them in third party component code, I found could be replaced with AnsiStrings with zero consequence, and only minor screwing around to accomplish most of the rest. The only exceptions appear to involve outside APIs.

    And I never suggested that shortstrings were obsolete - I said they were obscure, and I stand by that.

  21. 21
    Kryvich Says:

    @Jolyon Smith:

    "I suggest you do a Find In Files of the VCL source for "String[""

    I’ve found:
    - in RTL - 5 declaration of ShortString,
    - in VCL - 12 declarations of ShortString.

    Is it too much?

  22. 22
    Olivier Beltrami Says:

    Some quick, but heartfelt, comments on the above postings. Unicode support is belated, but necessary. We just have to live with it. However, I agree with the philosophy that code that breaks at compile-time is better than code that breaks at run-time (at a client site). I’d suggest a lot (and I mean a lot) of additional warnings (optionally made into errors) to alert those who want to be alerted to Unicode porting potential issues.

  23. 23
    Michael Skachkov Says:

    Nick,

    I’m a bit confused with the mess about type naming. As long as we will have new type UnicodeString, wouldn’t it be better to have also UnicodeChar having string=UnicodeString, Char=UnicodeChar and PChar=PUnicodeChar?

    I think this would look much more consistent

  24. 24
    Olaf Monien Says:

    @Michael:

    Having UnicodeChar instead of WideChar etc, is being discussed internally afaik, so it might happen …

  25. 25
    Simon Says:

    Great work! Now focus on delivering a FAST AND BUG FREE IDE and I will buy this Delphi.

  26. 26
    Hans-Peter Says:

    >As part of the process, we had a partner summit here >this past week. We invited some of our key technology >partners out to look at early versions of Tiburon, and >they ran their code against it.

    I hope that DevExpress is among these key technology partners!

    Hans-Peter

  27. 27
    Magnus Flysjö Says:

    I just want to add my two cents to the discussion to balance all the negative stuff written above.
    I think Tiburon will be just right and have no doubt that CodeGear will perform changes just as they need to be done for the best of the majority of users. That said, I am really looking forward to finally get rid of all the pain we have when localizing applications for far-east markets.

  28. 28
    DelphiUser Says:

    Trying to read between the lines here:

    Is compiling to Win95/98/ME going to be retained after all?

  29. 29
    JanisT Says:

    Please add native support for PNG in VCL.
    Please!Please!…Please!Please!

  30. 30
    Peter Says:

    I would meet a Delphi IDE without .net and docexplorer help. Waiting since D7….

  31. 31
    Fred Says:

    speaking of 3rd party component companies like DevExpress….I’m more concerned that if I upgrade to D2008 from Turbo Pro 2006 - are they all going to charge us for updating the current versions of their components that are now compatible with Delphi 2006/2007…to be compatible with D2008!??

    PLEASE….any of you 3rd party companies reading this….appreciate if you could clarify this.

    If you guys WILL charge us (and call them "new versions")….I’ll go ahead and purchase D2007 now, but it’ll more than likely be the last Delphi purchase I ever make since I wouldn’t want to pay for upgrades to be compatible with D2008 (just because of Unicode, which I have ZERO no need for).

    Therefore, I wonder just "how easy" it will be for them to bring everything up to spec and whether or not they will have to spend alot of time to make everything work right and feel the need to charge (for their time), which I of course would completely understand. I guess it also depends how well D2008 works "out of the box" as well!

    Since I rely heavily on 3rd party components….this is a serious concern of mine and the future of me using Delphi.

  32. 32
    Joeri Says:

    @Fred: 3rd party developers would charge you for new versions with new features. Unicode support is a new feature, so they’re pretty likely to charge you for it, just like CodeGear will be charging you for a new version of Delphi with Unicode support.

  33. 33
    Fred Says:

    @Joeri - I understand what you’re saying, but for alot of us…Unicode seems to have been "a should have been" rather than a "new feature" (for me actually it’s a non-starter as I have no need for Unicode). At the same time as well…most of the component vendors had 2006 components and had the same versions of those components updated to support 2007 as well with no charge, which seemed to also require work as most of them didn’t come out "supporting D2007" as soon as it was released. ;)

  34. 34
    Lance Says:

    @Fred - I think you need to wake up and smell the coffee burning. While I would love to see the component companies provide full support of D2008 and the Unicode features at zero extra cost, that is just not a realistic expectation. That would be like me expecting you to spend 200+ development hours to convert your program to a newer version of Windows that your product wasn’t compatible with and give it to me for free. That’s also why many products release new versions of their software to sell to customers because a new OS, like Vista came out and use it as an excuse to sell and upgrade. With Vista, I can definately see it because for many developers, it took a lot of time to get it right to play with "uncle Bill".

  35. 35
    Fred Says:

    LOL…I guess you didn’t read Nick’s original post then? According to them…"it was quite painless", so where do you pull 200+ hours from?? I didn’t say it….Nick did! Forget about "new features" for a minute….let’s just talk "unicode enabled" here.

    So here’s a big question, then: Will D2007 compatible components run fine on D2008 if we do NOT need "unicode enabled components"?? Of if I want to run my apps "as is" without worrying about Unicode or am I forced to make it Unicode??

    "As part of the process, we had a partner summit here this past week. We invited some of our key technology partners out to look at early versions of Tiburon, and they ran their code against it. They all were able to make great progress towards converting their code in just the one day they were here. All of them reported that the porting process was quite easy, and they were very pleased with how it all went. A number of them had their entire code based transitioned to the new compiler and RTL and had their components up and running within a few hours. All in all, it was quite painless for the guys who build components and products for Delphi. Everyone was quite pleased at the end of the day."

  36. 36
    Jolyon Smith Says:

    @C Johnson: Sorry mate, if you think ShortStrings are some thing called an "ASCII string" then you are even more behind the times than I thought and really don’t have a clue when it comes to string types.

    If YOU really wanted to be picky you should have said they were 8-bit ANSI strings, although of course they could equally hold DBCS or MBCS ANSI strings, so any attempt to try and distinguish these from any other than an ANSI string smacks of desperation.

    And though you never used the word obsolete, that certainly WAS the gist of your comments.

    And sure, String[] will continue to work as before, EXCEPT that when you assign a "String" to a "String[]" there will now be a lossy transcode from Unicode to ANSI, which is NOT what used to happen.

    Unlikely?

    I don’t think so given that (ime) a very common use of String[] is in declaring program data types for application defined fixed length strings:

    type
    TFooCode = String[10];

    Bear in mind also that anyone that ever tried to declare a string in a variant part of a record will have found themselves being made aware of ShortStrings.

    Also, I hope you didn’t use the help system of any Delphi IDE later than 7.x as reference for the visibility of String types - as visiblity of ANYTHING those later/most recent help systems are of course useless.

    In the D7 help, enter "string" in the help index and you will be offered "About String Types" and "Short Strings".

    In "About String Types", Short String is the first listed and described string type.

    Now for sure they are described as supported for backwards compatability. But for most users the key word in that phrase is "supported". "Why" doesn’t come in to it.

    Such things are quite often then input by a user via an Edit box and naively assigned:

    sCode: TFooCode;

    sCode := Edit1.Text;

    Edit1.Text is String - same as it ever was. Oh, except that it isn’t - it’s now UnicodeString when it used to be ANSIString, but String[] is still an ANSI string.

    So actually no, things will NOT work as they always have.

    They will continue to compile and they will continue to "work", but they will work in a quite significantly differently way (NOTE that Nick did not go so far as to say that things would continue to work AS THEY ALWAYS HAVE, only that they would continue to work).

    And the unavoidable issue is that it creates a situation where "String" is NOT universally and consistently Unicode.

    That is bound to cause confusion.

    I would actually prefer it if the String[] syntax were finally killed off, so that such code would not even compile, forcing such code to be properly addressed and the Unicode implications for that code being properly considered.

    After all, what possible impact could that have if it’s so rarely used (respecting your aversion to the word "obsolete").

  37. 37
    aam aam Says:

    Those are great news for those who need unicode support but for us, the ones who will prolly never need it, will have the next problems:

    - Difficulties in migrating code to Tiburon.
    - Bigger executables
    - Executables will need more memory.
    - Lost of speed for unicode support.

    So, this ain’t good news for us, and I really spect Tiburon brings some good news for no unicode supporters. In other words is nice to see the unicode support for those whoo need it but how about blogging about something not unicode supporters care?

  38. 38
    Nick Hodges Says:

    aam aam –

    We are finding that string processing is improved with the new UnicodeString type, mainly because it is more efficient in handling the data that it manages.

    Nick

  39. 39
    C Johnson Says:

    aam aam -> I got 8gb of ram for this machine for 200$ -total-. I suppose I could have splurged on the fancy lower latency stuff for 240$ (ok, yes, I use a 64 bit os, and delphi apps are 32 bit, but each of them can get at at least 2gb of ram each without stepping on each others toes, so even without a 64bit compiler, you can take advantage of it)

    I just don’t see ram as the huge barrier it used to be.

    True, it is no excuse to see things run rampantly out of control, but 100k here or there, and maybe 4 or 5 megs of extra data for the extra functionality and possibilities it provides doesn’t seem that extreme to me.

    Hmmm.. I’ve definitely mellowed out lately. Maybe time to cut the dosage [grin]

  40. 40
    Kryvich Says:

    @aam aam:
    "for us, the ones who will prolly never need it…"

    never say never. (C) ;)

    "- Lost of speed for unicode support."

    Actually even growth of speed is possible for GUI application. It’s because now Delphi applications use the ANSI Win API, and OS internally converts every call to ANSI API functions to a correspondent call to Unicode API functions. In Unicode Delphi these internal conversions will be eliminated.

  41. 41
    Luigi Sandon Says:

    @Vitaliy and others: do you know that all the "NT" version of Windows are Unicode - only the died 9x line isn’t - and the ANSI API is there only for compatibility? Have you ever tried to sell and support applications all around the world in several different languages, handling data in several different languages? Unicode is not optional, it’s a must, CodeGear is already late IMHO, it should have been introduced before, everybody else is already Unicode, better late than ever.
    I know it will have an impact on the current code base, and I know many old, unmantained libraries will become obsolete. But there are many Delphi developers who need tools to develop applications of today and tomorrow, not yesterday.

  42. 42
    Rafael Costa Says:

    Hello Nick,

    Unicode is one of the new fwatures in Tiburón. It seams be very smooth process, but what about parameterized types, Updated and improved VCL and some other things that were in Roadmap? They will be ok in the First Half 2008?

    thanks

  43. 43
    Nick Hodges Says:

    Rafael –

    Yes, there are plenty of new features besides Unicode coming in Tiburon.

    Nick

  44. 44
    Vitaliy Says:

    @Luigi Sandon
    I disagree with you. NT have(!) API functions without Unicode requirement, yes, for compatibility.
    And we need compatibility from next Delphi version (this is why many guys still sit in Delphi wagon, if you throw them our, you’ll be alone, COdegear). Read latest Joel artile on similar issue.

    And concerning Unicode necessarity. I make some software programs that have interface versions for China and Korea. And they work without Unicode. You just must understand that you are doing.
    Unicode visual components are also available for quite long and I don’t see point to making unicode main feature for next release. 64bit compiler with impoved optimizations and native parralel programming support are real target.
    But it looks that we’ll never see this. Pity.

  45. 45
    Jolyon Smith Says:

    @Nick: Details of the benchmarks used to identify "improved string processing" (not "faster" I note, although presumably this is what you hope to imply)?

    If you mean interactions between VCL and the API, great. Not actually very meaningful though since "performance problems" in application rarely stem from this boundary and are more typically associated (when associated with string handling at all) with string processing within the application context itself.

    And if you mean to say that typical application string operations (concatention, substring identification/replacement) etc are faster, then frankly I don’t believe you.

    Unless you have made other performance improvements not directly connected with Unicode.

    e.g. incorporating the FastStrings library.

    (in which case of course the performance "improvements" are only an improvement relative to "vanilla" RTL routines. Anyone already using FastStrings is of course already in am improved situation in this respect, just as anyone using FastMM in Delphi 7 won’t see much - if anything - in the way of memory manager related improvements in D2007)

    Incidentally I hope you *have* incorporated the FastStrings library or are at least working (alone or with someone) to make a Tiburon-safe version of that library available.

  46. 46
    David Ninnes Says:

    Nick,

    That sounds excellent, unicode and generics woo hoo. Could I suggest as part of your unicode enhancement, somewhere either in the docs/blogs etc what steps would be required to internationalize apps with Unicode. You guys have already done this with Delphi, so any pointers that could be provided around release would be of great help.

    thanks,
    Dave

  47. 47
    DanB Says:

    Nick, I’m glad to hear the 3rd party vendors are being involved. I’m really keeping my fingers crossed that this transition goes as smoothly as you anticipate (though I still think it will cause great pain for many developers out there).

    For those worried about paying again for 3rd party components: I’ll happily pay for unicode component upgrades. It’s the components that are no longer supported that I worry about. So open those wallets; it’s good to support the delphi ecosystem and besides, software really is just about the cheapest industry you could be in. For the cost of a computer and all the software you need to be a profitable company, you couldn’t even get started in most other industrys or even open a coffee shop for that matter.

  48. 48
    Brian A Says:

    Thanks Nick for the update - that is great news! Unlike others here, Unicode support is by far the #1 priority feature request for Delphi (after all, 64 bit adoption rates are pretty meager, at least until Microsoft releases a 64-bit only OS like they should have done with Vista).

    BTW, priority #2 for us would be reliability and stability in the IDE (CodeInsight, etc)!

    Last point - anyone at Codebase ever kick around a C#/Java type language Win32 product? (modern syntax, best of breed Win32 support). The Pascal language is by far the #1 thing that prevents widespread Delphi adoption amongst _new_ users. New, young developers cringe at writing Pascal versus C# or Java code. But those developers also cringe every time they compare their memory/cpu hogging .NET/JRE application to a native Win32 competitor.

  49. 49
    Luigi D. Sandon Says:

    @Vitaly: I disagree with you too :) About the Windows API (from MSDN): "New Windows applications should use Unicode to avoid the inconsistencies of varied code pages and for ease of localization." "Some newer functions support only Unicode versions."
    Some unicode controls are available, but you are forced to use only them, and can’t take advantage of other, more advanced ones (DevExpress, for example). And without a full Unicode RTL, you cannot be sure your string isn’t passed to a function that will trash it. Translating the interface is not enough, you may need an interface in one language being able to display data in another outside its ANSI character set - easily.
    Compatibility is important, but can’t hinder any improvement. Unicode *is* the future (and most of the present, already). Or improvements should be just new UI controls? .
    I am awaiting 64bit eagerly too, but do you think "compatibility" won’t be an issue too?

    @Brian A: there is already a C#/Java type language for Win32. It’s called C++ . Both Java and its clone C# are based upon it, and use the same syntax, more or less. Any reason to create another one?

  50. 50
    Giel Says:

    I don’t see major problems with moving code to Unicode. The only drawback for me is that the code would no longer run on win95/98/me. In theory this could be solved using MSLU (http://www.microsoft.com.nsatc.net/globaldev/handson/dev/mslu_announce.mspx).

    MSLU consists of the Unicows.dll which contains Unicode wrappers around many Ansi windows routines. With VC++ you have to compile your app with Unicows.lib included. This takes care of linking unicode routines to the Unicows.dll instead of user32.dll etc. Haven;t looked at this file, but I think this would require some work by CodeGear to get this to work with Delphi (but maybe anyone in the know can do it).

  51. 51
    DelphiUser Says:

    That looks interesting. Will look into it.

  52. 52
    C Johnson Says:

    @Vitality -> Just because there are non unicode APIs doesn’t mean they aren’t just translating stubs for the unicode API leading to an inherent slowdown. In fact, NT OSes are layers of translating layers all over the code native API most of us never see or touch (which is why there are win32 & posix APIs for NT) - less of those layers has to be for the better.

    Besides, chars on 32 bit processors are inheriently bad. Anything that gets it closer to 4 byte entities would reduce the hardware latencies involved with data not quad byte aligned. Strange but true.

    @Giel -> The problem as I see it in supporting people who use OSs that are now coming up on 10 years old is that they probably aren’t spending much money on software or software development.

    You could play the "stability" card, except that W2K and XP are both significantly more stable (even Vista), so I can only see someone staying with Win98 either to support a flawed legacy app they don’t want to pay to bring forward, or is too cheap to pay for a few OS licenses.

    (ok, maybe they have a machine with legacy hardware they can’t or won’t update either, but to hold back all the machines in the environment for that sake… again, back to wondering how much money they are actually going to spend on new software)

    Either way, I’m not sure there is a ROI case to make it worth wasting much effort chasing that crowd.

    Besides, if you REALLY need to target old platforms, you can always use the tools for those platforms (thinking D3, D5)…

  53. 53
    David Howes Says:

    Thanks for the info Nick. Couple of questions? You say that the string type is UTF16 by default, does that mean it can be set to UTF32 to avoid needing to use surrogate pairs? What else is supported, presumably utf8 is for compatibility reasons? Is the new string type reference counted? Is it always multi-byte? for example if a string of 8 bit Ascii chars is assigned to it, will it always still use a 16bit widechar for each char (assuming utf16)?

  54. 54
    Sean Says:

    @Luigi
    c++ is not a C#/Java type language. While it has some broad similarities in syntax it is fundamentally different in several areas (header files, memory management, generics, namespaces…).

  55. 55
    Luigi Sandon Says:

    @Sean: define a Java/C# type language, please. Garbage collected? No headers? Single inheritance? What’s actually the difference between templates and generics? IMHO a Java/C# type language is one designed around its VM - while C++ is designed to run on several different architectures. Probably what’s really missing in C++ are properties.
    What market could have a C++ -like language for Win32/64 that is not C++ (and can’t use the tons of code available for C++)?
    People are learning and using language like Python or Ruby that have very different syntaxes from Java/C#. What’t the problem in learning ObjectPascal? Just because it’s not fashionable?

  56. 56
    Fred Says:

    you guys have it all wrong…. C++ ??? what’s the market here?? people who can understand cryptic programming languages….OR….people who can understand READABLE programming languages?? with all due respect to C++ and the curly braces languages - to get the "masses" programming (mind you…it’s STILL after all = A Business)….is something most people will be able to GRASP in alot less time. good luck learning C++ or the like….or and "easy to read" language. yes…"if then else" is ALOT easier to understand than { } { }….after all - most of us DO speak English! Pardon moi…I have yet to see a language written in spanish…french…or german….or any foreign language other than assembly. LOL!

    anyway…back to topic - nobody’s answered MY question!:

    So here’s a big question, then: Will D2007 compatible components run fine on D2008 if we do NOT need "unicode enabled components"?? Of if I want to run my apps "as is" without worrying about Unicode or am I forced to make it Unicode??

  57. 57
    Vitaliy Says:

    @Fred - main problem is that you are forced to make them Unicode.
    This’ll kill many legacy apps written in Delphi.
    And all this to "do it right". Joel is fully right here.
    MS and Codegear guys went out of their minds.
    reality will stop them, of course.
    But for CG it’ll be too late and too hard.

  58. 58
    Ron Grove Says:

    Glad I don’t sell anything to programmers. Good grief.

  59. 59
    Luigi D. Sandon Says:

    @Vitaliy: I wonder if you read Bauer’s articles about the Unicode switch, especially this: http://blogs.codegear.com/abauer/2008/01/09/38845, read the paragraph "OMG!! All my code is going to break! I can’t handle this!!", please.

  60. 60
    Giel Says:

    @Ron: :-)

  61. 61
    DelphiUser Says:

    @C Johnson: "The problem as I see it in supporting people who use OSs that are now coming up on 10 years old is that they probably aren’t spending much money on software or software development."

    Not all software is sold as stand-alone solutions. Much of what I do supports other hardware. Supporting older OSs will sell more hardware boxes.

    I also provide in-house solutions, small single-purpose utilities. (I venture to say that Delphi’s main market IS small-to-mid applications). I bill hours of work. Supporting two code bases is going to be hard to explain to the client, esp. since it wasn’t necessary so far.

    I was looking forward to the time-saving aspects of generics (C# does them well), but not at this cost.

    Giel’s tip on UnicoWS.dll is a good one, but at first glance it looks like a Delphi interface will be needed first.

    Isn’t there an open project that covers this?

  62. 62
    Vitaliy Says:

    @Luigi
    "I wonder if you read Bauer’s articles about the Unicode switch"

    I clearly read all about this.
    And as I see, most commenters agree about importance of such switch. According to other Codegear blog posts we won’t see any such switch because of some difficulties they try to explain to us (this is bad side of blogs, instead it’ll be better to try to solve this difficult problems rather then write here poems how you can’t solve it).
    I’ll be Cassandra here, but if won’t see this switch in next Delphi version it’ll be last Delphi version ever.
    Another issue is immediate revival of Turbo line as key products to gain new users (as I understand, Nick still strongly believes in students buying Delphi 2007 :-) )

  63. 63
    DelphiUser Says:

    Talking of two different things?

    I understood Luigi to mean "Unicode switch" as in "changing to the Unicode world". Not "Switch" as in a compiler unicode on/off switch.

    From Bauer’s articles it’s clearly an either/or thing, not something you add, like wide strings.

    I also read Microsoft’s UnicoWS.dll/MSLU description (see link in Giel’s post.) Microsoft evidently faced the same issue.

    I am hopeful that someone smarter than me looks into the UnicoWS dll and suggests some way of integrating it/wrapping it around Delphi Tiburon.

  64. 64
    Luigi D. Sandon Says:

    Yes, I wrote "switch" meaning Delphi going Unicode only, not a compiler switch.
    I see and understand some of the issues it will bring, especially in some markets, but IMHO Delphi would be doomed if it becomes a rearguard development tool only.
    It could be very difficult to attract new developers (and even keeping some of the current ones) saying "we support Windows 95!" but not Unicode fully, or the like.
    I mantained some 16bit applications for years (up to the year D5 or D6 was released, IIRC) because of some customers requests, and I did it using D1 - never thought I should have asked Borland that Delphi 5 or 6 should have supported 16bit applications or it would have been the "last Delphi ever".
    If RAD 2007 is the last version supporting the older OSes, keep using it if you need to develop for them. Codegear could even ship it with newer releases as they did with Delphi 1.
    But if new releases are exactly like the old ones, what future for Codegear??

    PS: I read Joel’s article about IE8: IMHO he’s plainly wrong. Other sectors went under huge standardizations efforts (what’s ANSI or ISO for??), it’s time IT follows the same path. Why should we be different?
    Would people like appliances using each a different voltage and plug to work? Or cars needing custom gasoline and tyres to run? Or each country using a different metric system? EU changed its currency in 2002, can’t people update their web pages?
    Sometimes standards needs to be enforced, and kudos to MS if they are going to do it the "right way" even if it would break pages working in previous IE releases.
    And I won’t complain if Delphi will break bad written applications.

  65. 65
    TK Says:

    What to do if I use WideString properties in my controls? How to update for Tiburon? Make a new conditional directive and use the old-new "string" in Tiburon only?

  66. 66
    S@nne Says:

    Hi Nick, thanks a lot for this great news. Looking very forward for a beta release so we all can start rewriting code. I rather have to rewrite more and have a stable and transparant Delphi release/code than endless backward compability and huge unclear code.

    Is there any estimation when a beta relase or final release will be shipped? Or does the roadmap still fits?

    Awesome you write a blog and great idea of the component writers to test it.

    1. Unicode
    2. Ease of use multi # core processing
    3. 64 bit support

    Regards

  67. 67
    pcunite Says:

    I am a user since Delphi 6. I look forward to USC2 (windows Unicode) support. Anyone thinking that they won’t need it is only writing software for backyard garages…

    To help some people out with Unicode. An ANSI string is already in UTF8 Unicode format. Windows and now Tiburon will use USC2 (also called UTF-16) encoding. It will be easy for the Delphi compiler team to see your ANSI string and make it work correctly.

  68. 68
    Nick Hodges Says:

    TK –

    If you are using WideString properties on your components, then you can keep on doing that. WideString isn’t going to change.

    But you can explore using UnicodeString as desired/needed.

    Nick

  69. 69
    Mike P Says:

    what happens with string constants used in a construct like this:

    const
    scDLLCallName_ModuleLED=’ModuleLED’;

    function MyModuleLED():integer; stdcall; external ‘MyDll.dll’ name scDLLCallName_ModuleLED;

    do i then need to use typed constants and type it as AnsiString? or is this code broken in d2008?

  70. 70
    Mike P Says:

    hopefully this isn’t double-posted (the e-mail address was old).

    what happens with string constants used in a construct like this:

    const
    scDLLCallName_ModuleLED=’ModuleLED’;

    function MyModuleLED():integer; stdcall; external ‘MyDll.dll’ name scDLLCallName_ModuleLED;

    do i then need to use typed constants and type it as AnsiString? or is this code broken in d2008?

  71. 71
    Charles Vinal Says:

    Nice job Nick - you sure get a tough audience. On a side note, have you seen how REST and Web Oriented Architecture are taking off? Good old web broker is awesome for this stuff and we have created a complete WOA based on it - powering over 100 commercial websites today. Make sure you keep web broker in later versions - heck, you might want to even enhance it a bit.

    Keep up the good work.

    Best Regards,

    Charlie

  72. 72
    Nick Hodges Says:

    Mike P –

    That code will work just fine — depending on what your DLL is expecting. In that particular case, you’d likely have to recompile your DLL.

    Nick

  73. 73
    Mike P Says:

    thank you, Nick, for your response.

    contrary to the way i put my example, it’s someone else’s dll (a VC++ DLL). (i should’ve mentioned that.)

    thank you!
    mp

  74. 74
    Nick Hodges Says:

    Mike –

    In that case, you’ll likely want to explicity declare the type you want for passing into the DLL.

    However, as shown above, your code will compile. The constant will be fine for use as you are using it.

    Nick

  75. 75
    Mike P Says:

    thanks Nick!

  76. 76
    Steven T. Cramer Says:

    Nice! If I could request that while rewriting compiler for 64bit if you could make it a dual pass compiler so we could simplify classes into units of their own vs huge units with tons of .inc files that would be awesome.

    Delphi 1 was single pass for speed reasons, and it has just stayed every since. But now is a good time to fix that.

    Thanks!!! I look forward to unicode support.

Leave a Reply

© 2008 Nick Hodges | Entries (RSS) and Comments (RSS)

Your Index Web Directorywordpress logo
Close