[wg-camlp4] Raw representation of literals in the parsetree

Hongbo Zhang hongboz at seas.upenn.edu
Thu May 23 17:15:11 BST 2013


Hi, Jeremie, I think it's reasonable to change the lexer.
FYI, camlp4's lexer works this way


On Thu, May 23, 2013 at 11:26 AM, Jeremie Dimino <jdimino at janestreet.com>wrote:

> We discussed it further with Alain but it turns out it doesn't work
> very well.  A constant may be composed of several tokens so it is not
> possible to have the exact raw representation.  And for strings one
> can use the new quoted strings which are already treated as raw
> strings.
>
> Another possibility would be to modify the lexer so that it doesn't
> 'evaluate' tokens immediately and let the parser do it.  At least it
> would avoid this kind of thing:
>
>   # max_int;;
>   - : int = 4611686018427387903
>   # 4611686018427387904;;
>   - : int = -4611686018427387904
>
> On Thu, May 2, 2013 at 2:00 PM, Gabriel Scherer
> <gabriel.scherer at gmail.com> wrote:
> > I think it is very reasonable, and would be a good fit for Alain's
> branch.
> >
> > On Thu, May 2, 2013 at 1:35 PM, Jeremie Dimino <jdimino at janestreet.com>
> wrote:
> >> Hi,
> >>
> >> We recently felt the need to write a new extension that would check
> >> integer literals.  The goal is to verify that ones containing
> >> underscores match a specific regular expression.  Namely digits are
> >> grouped by 3.
> >>
> >> We can do that for instance with a camlp4 token filter, but we wanted
> >> to try with ppx.  It is currently not possible to do it by looking at
> >> the parsetree since constants are already evaluated (integers are
> >> represented by an OCaml int) and the raw form is lost.
> >>
> >> One solution would be to add the raw representation of constants in
> >> the parsetree. For instance by changing Asttypes.constant to:
> >>
> >> ,----
> >> | type constant_value =
> >> |     Const_int of int
> >> |   | Const_char of char
> >> |   | Const_string of string
> >> |   | Const_float of string
> >> |   | Const_int32 of int32
> >> |   | Const_int64 of int64
> >> |   | Const_nativeint of nativeint
> >> |
> >> | type constant = constant_value * string
> >> `----
> >>
> >> I believe it could also be useful for other rewriters, especially ones
> >> dealing with strings since they would be able to compute the correct
> >> location inside a string. And maybe also for printing: Pprintast could
> >> use the representation choosed by the programmer instead of a
> >> standardized one.
> >>
> >> Do people thinks that this a reasonable thing to add to the OCaml
> >> parsetree? If yes I can do the modification.
> >>
> >> Jeremie
> >> _______________________________________________
> >> wg-camlp4 mailing list
> >> wg-camlp4 at lists.ocaml.org
> >> http://lists.ocaml.org/listinfo/wg-camlp4
>
> --
> Jeremie
> _______________________________________________
> wg-camlp4 mailing list
> wg-camlp4 at lists.ocaml.org
> http://lists.ocaml.org/listinfo/wg-camlp4
>



-- 
-- Regards, Hongbo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ocaml.org/pipermail/wg-camlp4/attachments/20130523/5577b50b/attachment.html>


More information about the wg-camlp4 mailing list