Ticket #1047 (closed defect: wontfix)

Opened 7 years ago

Last modified 7 years ago

Can safe module avoid building an AST?

Reported by: justinc Owned by: justinc
Priority: minor Milestone:
Component: repyV2 Version: 0.1t
Severity: High Keywords:
Cc: vijay, monzum, justinc Blocking:
Blocked By:

Description

The AST built by the compiler module is quite large. We really don't need it because our focus is on information we can derive without examining other tree nodes. Can we avoid this memory overhead?

Change History

  Changed 7 years ago by mkaplan

  • owner changed from mkaplan to justinc
  • status changed from new to assigned

  Changed 7 years ago by justinc

I'm starting to look at the memory issues described here. I'm using Zenodotus as a test application that may use a fair amount of memory...

Here is what we see in memory size given our current code (using the fork-then-safety-check model):

justincappos   10328 python repy.py restrictions.default zenodotus.mix -p 12345

By changing safe.safe_check(code) to call safe._check_ast(code) instead we see:

justincappos   21648 python repy.py restrictions.default zenodotus.mix -p 12345

  Changed 7 years ago by justinc

Some more notes:

Disabling the check altogether has essentially the same memory use as our current code...

Calling parser.suite(code) has little impact on memory use...

The real memory use comes from this step: parser.st2tuple( parser.suite(code)). I've looked and this is a pure C module that hooks into the compiler. Unfortunately, there doesn't seem to be an easy way to read the elements from the parser one at a time (which is what we want).

For completeness, I've used strace to look at what the parser module does at this point and it seems normal. I'm going to look at the code now.

  Changed 7 years ago by justinc

It isn't really possible to interpose with a legacy version of parser.so.

I'm almost out of ideas. About the only thing I think might work is to break the code up into separate pieces and check each separately. I need to think if this is sane, but it's clearly suboptimal. I would think that even just breaking it into a few pieces would have a huge impact. However, this wouldn't always be possible and would certainly be a pain. Is it worth the bother?

  Changed 7 years ago by justinc

  • status changed from assigned to closed
  • resolution set to wontfix

I've thought about this and the use of virtual namespaces to do separate imports will essentially do the same thing as I was saying in the last posting. We really should be able to remove this in Repy V2 once this is done. Let's retest then!

follow-up: ↓ 7   Changed 7 years ago by mkaplan

A major problem with safe_check (from a memory standpoint) was that it was using dictionary methods that create a copy of the dictionary it is iterating through. These include dict.keys() and dict.items(). This was especially problematic given the amount of recursive calls to _check_node().

I have only done minor benchmarking, but there seems to be a non-negligible difference in the memory used. Further benchmarking might be worthwhile, as it is possible that this was the cause of the large memory consumption.

These changes were implemented in r5700.

in reply to: ↑ 6   Changed 7 years ago by justinc

Replying to mkaplan:

I have only done minor benchmarking, but there seems to be a non-negligible difference in the memory used.

Can you do a quick test that quantifies this? My testing before (see comments above) showed that almost all of the memory cost comes from the parser.st2tuple call.

  Changed 7 years ago by mkaplan

I was unfortunately unable to replicate the benefit that the initial testing had given me. It appears that I was incorrect. I apologize for the inconvenience.

My second testing was with the following code that I ran in the 'preparetest' directory:

import safe2 as safe #safe2 is the new version of safe. Replace with 'safe' to test the older version
import time
fh = open('librepysocket.repy','rb')
data = fh.read()
fh.close()

safe.safe_check(data)
time.sleep(5)
Note: See TracTickets for help on using tickets.