HomeForumSourceResearchGuide
Sign in to reply to forum posts.
Primitive assignment and very slow decimal transformation

Hello again,

As previously mentioned, I'm trying to build an extensible and data type agnostic extension to the CSV functionality. This is a follow up post about the issues I've encountered with primitive assignment and very slow decimal transformation, hence these examples are combined in a single post.

1) The following line causes dnc to block or to be very slow. Code:

    dec64 newColumn[] = new dec64[](1.2, 2.3, 3.4)
    out.print("newColumn[0] = $(newColumn[0]), newColumn[1] = $(newColumn[1]), newColumn[2] = $(newColumn[2])\n")
2) The following lines cause a seg fault. Why? If you substitute the 10 for a 3 it blocks, or is very slow. I suspect Dana uses the array size in the RHS of the assignment, but why does this not throw a meaningful error. Code:

    int16 newColumn[] = new int16[](22239572, 2123561343, 13513451334)
    byte serialData[] = clone dana.serial(newColumn)

3) The idea for the CSV extension is to be able to pass any data type to the function to be written to a CSV file. This means that I have to follow this pattern for primitive data types (i.e. int and dec):

    // Why does this work for integers but not for decimals?
    cue.addColumn("test.csv", true, ",", cue.wrapIntArray(new int512[](newIntColumn), 8), 
                  columnName = "NewColumnInt8"/* , columnNumber = 3 */, limitData = false)
    cue.addColumn("test.csv", true, ",", cue.wrapDecArray(new dec512[](newDecColumn), 8),
                  columnName = "NewColumnDec8"/* , columnNumber = 3 */, limitData = false)

Here the interesting part is the wrapIntArray and wrapDecArray functions which transform any size int and dec to int512 and dec512.

However, this only works for integers and not decimals as demonstrated by this. Code:

    int16 int16Array[] = new int16[](1, 2, 3, 12, 43, 4433)
    int512 int512Array[] = new int512[](int16Array)
    for (int i = 0; i < int512Array.arrayLength; i++) {
        out.print("int512Array[$(i)] = $(int512Array[i])\n")
        }
    dec decArray[] = new dec[](1.0, 43.2, -1.2)
    dec512 dec512Array[] = new dec512[](decArray)
    for (int i = 0; i < dec512Array.arrayLength; i++) {
        out.print("dec512Array[$(i)] = $(dec512Array[i])\n")
        }

The output from the above code is:

int512Array[0] = 1int512Array[1] = 2int512Array[2] = 3int512Array[3] = 12int512Array[4] = 43int512Array[5] = 4433dec512Array[0] = 11242005596642755118.6925521544392995069dec512Array[1] = -11912999616504406411.5172415816139032473dec512Array[2] = -3739427922542488922.3053478470991316394

4) The final problem could be demonstrated with this example:

    int512 int512Array[] = new int512[](1023462456354735467567968894568467938, 1234, 5234562)
    out.print("int512Array[0] - $(int512Array[0])\n")
    int intArray[] = new int[](int512Array)
    out.print("int512Array[0] - $(int512Array[0])\n")
    out.print("intArray[0] - $(intArray[0])\n")
    out.print("$(typeof(int512Array[0]).size) - size\n")
    out.print("$(typeof(intArray[0]).size) - size\n")

Here the output is:

    int512Array[0] - 15700956033302498786
    int512Array[0] - 15700956033302498786
    intArray[0] - 15700956033302498786
    512 - size
    8 - size
Is the issue a printing one or is it something to do with the int512Util? Also is there adec512Util?From my experience the decimal results are somewhat random and take a lot to produce and print.

5) Finally, how to convert dec to int and vice-versa? I.e., what is the general procedure?

Apologies for the long post.But this really obstructs the decimal feature of the CSV extension.

Many thanks,

Sava

Hi,

I'm not totally sure what you're aiming at - it might be good to post the API you intend to build as some context.

Others might have a different answer here but In general I'd say that the large primitive types in Dana are really for special cases, and not for general data interchange. If you look at the Dana APIs, almost all of the standard library deals with host-sized values (i.e. int8 / dec8 on a 64-bit machine, which are mapped to int / dec).

I would build your utility functions, therefore, to deal only with int and dec. As a bit more justification, let's imagine you want to deal with an int512. The maximum value of a variable of such type is:

1044388881413152506691752710716624382579964249047383780384233483283953907971557456848826811934997558340890106714439262837987573438185793607263236087851365277945956976543709998340361590134383718314428070011855946226376318839397712745672334684344586617496807908705803704071284048740118609114467977783598029006686938976881787785946905630190260940599579453432823469303026696443059025015972399867714215541693835559885291486318237914434496734087811872639496475100189041349008417061675093668333850551032972088269550769983616369411933015213796825837188091833656751221318492846368125550225998300412344784862595674492194617023806505913245610825731835380087608622102834270197698202313169017678006675195485079921636419370285375124784014907159135459982790513399611551794271106831134090584272884279791554849782954323534517065223269061394905987693002122963395687782878948440616007412945674919823050571642377154816321380631045902916136926708342856440730447899971901781465763473223850267253059899795996090799469201774624817718449867455659250178329070473119433165550807568221846571746373296884912819520317457002440926616910874148385078411929804522981857338977648103126085903001302413467189726673216491511131602920781738033436090243804708340403154190335

.

In my view there are very few programs which could ingest a CSV file containing values of this magnitude. Values of these types are also extremely expensive to compute, so again, I think they're just for special cases.

I'm not sure I got the other parts of your post (I couldn't recreate the issues without the full program, I think), but maybe this helps a bit for general direction of what you're trying to do.

Hi Sava,

A couple of additions from me. This line:

dec64 newColumn[] = new dec64[](1.2, 2.3, 3.4)

Is very slow to compile because (i) these numbers (dec64) are huge, at 128 bytes each, and (ii) the compiler has to do some non-trivial encoding work to convert from text into the base-10 decimal format that Dana uses. Once they're encoded computing over these values is quicker, but encoding/decoding to/from text is fairly slow at the moment.

I couldn't reproduce your runtime crash, I think I need a more complete example -- if you still have a full example showing that crash it'd be great to see it.

Barry

Hello Barry,

The provided code is incomplete and the example is very irrelevant. Here's the full example:

     int16 newColumn[3] = new int16[](22239572, 2123561343, 13513451334)
     byte serialData[] = clone dana.serial(newColumn)

The problem is relatively obvious and it has to do with the fact that the constant "3" has to be on the RHS of the assignment and isn't actually needed when one provides the initialization. However, it throws a seg fault and it took me some time to realise my mistake.

I'll edit the original post to avoid confusion.

Edit: Funnily enough, the missing line wasn't missing, but I guess the Dana code highlighting feature was just as confused about it :)

Many thanks,

Sava

Hello anita,

Here is the latest API:https://github.com/savakazakov/CSVUtilExtension

Disclaimer: Strap in before you read as the solution is quite unique :)

If you take a look at the wrapIntArray and wrapDecArray you can see that I went for an approach enabled by the fact that you one can pass any size int array into the initializer of a int512 array, i.e.

   int512 int512Array[] = new int512[](int16Array)

I was hoping this could resolve the issue of not being able to write data that is not strictly "int" or "dec".

However, if you run the example provided:

    int16 int16Array[] = new int16[](1, 2, 3, 12, 43, 4433)
    int512 int512Array[] = new int512[](int16Array)
    for (int i = 0; i < int512Array.arrayLength; i++) {
        out.print("int512Array[$(i)] = $(int512Array[i])\n")
        }
    dec decArray[] = new dec[](1.0, 43.2, -1.2)
    dec512 dec512Array[] = new dec512[](decArray)
    for (int i = 0; i < dec512Array.arrayLength; i++) {
        out.print("dec512Array[$(i)] = $(dec512Array[i])\n")
        }

You can see that this doesn't work for decimal arrays and the output is different every time you run it.Note: It is quite slow as mentioned by Barry!

Many thanks,Sava

Thanks for the extra info on the bug report, I can replicate that now :-)

Barry

Also, the decimal array inter-assignment from different dec sizes is a bug, thanks for pointing that one out.

Barry