applying plus to sublevels of sparsearray in Wolfram Mathematica converts it to a normal list. How to avoid it?

109 Views Asked by At

I have a sparse array of tuples of positive real numbers (in fact a probability distribution) say

p=SparseArray[{{100,1}->0.25,{100,2}->0.25,{101,1}->0.3,{305,2}->0.2},{641601,2}]

Now I would like to sum every list on the first level to get a sparse array of form

psummed=SparseArray[{{100}->0.5,{101}->0.3,{305}->0.2},{641601}]

naturally I could do that by changing the heads of the sublists

psummed=Plus @@ # &/@ p;

However this changes the SparseArray to normal list and therefore takes too much time

AbsoluteTiming[res = Plus @@ # & /@ p;]

{1.0770617, Null}

Is there a nice and fast way how to do this, while the result will be still a SparseArray?

2

There are 2 best solutions below

2
On BEST ANSWER

Using Part on a SparseArray object extracts a sub-array in Sparse form, and adding Sparse arrays uses Sparse addition, therefore you can write:

p[[All, 1]] + p[[All, 2]]
SparseArray[<3>,{641601}]

But wait, there's more! :o)

Because of the internal format of SparseArray your original array will be much more efficiently stored if it is transposed. Compare:

p2 = Transpose[p];

ByteCount /@ {p, p2};
{2567180, 768}

The tall skinny array takes up thousands of times as much space as the wide short one.
Further, with the data in this form we can do the sum more directly:

Plus @@ p2
SparseArray[<3>,{641601}]

We can use ArrayRules to inspect the data:

Plus @@ p2 // ArrayRules
{{100} -> 0.5, {101} -> 0.3, {305} -> 0.2, {_} -> 0.}

For other Mathematica questions I encourage you to join us on the dedicated site:

enter image description here

1
On

Given your SparseArray with dimension 641601 x 2:

p = SparseArray[{{100,1}->0.25, {100,2}->0.25, {101,1}->0.3, {305,2}->0.2}, {641601,2}]

... there is a function called ArrayRules which extracts back the rules specifying the sparse array. For example:

 ee = Most[ArrayRules[p]] /. {aa_, bb_} -> aa

returns output:

> {100 -> 0.25, 100 -> 0.25, 101 -> 0.3, 305 -> 0.2}

The next step is to do the summing of similar terms (adding the probabilities). The correct way to do this might depend on the structure of your problem ... however, if you first do a Sort to get similar terms (like the 100) to group together, ... you can then use SplitBy to bunch the similar terms (e.g. the 100) into grouped lists ... and add them up with a Map and the finished result will be:

qq = Map[(#[[1, 1]] -> Total[#[[All, 2]]]) &, SplitBy[Sort[ee], First]]

> {100 -> 0.5, 101 -> 0.3, 305 -> 0.2} 

You can then create a new SparseArray:

SparseArray[qq, Dimensions[p,1]]

This is much faster, as you required ... the whole operation has an AbsoluteTiming of about 0.03 seconds on my Mac, and is about 50 times faster.