Lexical Analysis With Flex, for Flex 2.6.2: Can I fake multi-byte character support?

To: Heeman_Lee@hp.com
Subject: Re: flex - multi-byte support?
In-reply-to: Your message of Thu, 03 Oct 1996 17:24:04 PDT.
Date: Fri, 04 Oct 1996 11:42:18 PDT
From: Vern Paxson <vern>

>      I assume as long as my *.l file defines the
>      range of expected character code values (in octal format), flex will
>      scan the file and read multi-byte characters correctly. But I have no
>      confidence in this assumption.

Your lack of confidence is justified - this won't work.

Flex has in it a widespread assumption that the input is processed
one byte at a time.  Fixing this is on the to-do list, but is involved,
so it won't happen any time soon.  In the interim, the best I can suggest
(unless you want to try fixing it yourself) is to write your rules in
terms of pairs of bytes, using definitions in the first section:

	X	\xfe\xc2
	...
	%%
	foo{X}bar	found_foo_fe_c2_bar();

etc.  Definitely a pain - sorry about that.

By the way, the email address you used for me is ancient, indicating you
have a very old version of flex.  You can get the most recent, 2.5.4, from
ftp.ee.lbl.gov.

		Vern