Rosetta memory problems - disable specific projects?

JE
Jon Etkins ID: 6571906 Posts: 6
09 Mar 2020 04:26 PM

I recently added CE to my personal systems which are both running the World Community Grid BOINC manager.  However, since doing so, the stability of my primary workstation has been impacted, and I've now seen a couple of instances of Rosetta tasks crashing due to memory problems.

I've tried manually setting Rosetta to No New Tasks, but it appears that CE  overrides that whenever it syncs.

Much as I like the idea of Charity Engine, if I cannot prevent it from running tasks that adversely affect my systems' stability, then I'll have to remove it from my primary workstation - sad, because it's my best contributor..

I saw in another old thread (from 2012?) that Rosetta was temporarily removed from CE due to memory problems at the time, so apparently this isn't the first time that it's misbehaved like this.

Any suggestions on how I can continue to work with Charity Engine while avoinding these problems?

Thanks!

Tristan Olive ID: 22 Posts: 383
09 Mar 2020 09:26 PM

Hi Jon, thanks for the report. We'll set Rosetta to No New Tasks while we look into this issue.

Can you describe how you're running the World Community Grid BOINC manager currently? It sounds like you had that installed, but then added the Charity Engine Desktop software. I haven't tried this, but would expect a conflict between the two, so wanted to clarify. 

JE
Jon Etkins ID: 6571906 Posts: 6
09 Mar 2020 10:03 PM

Hi, Tristan.  Sorry, I worded that badly - I simply used the Tools-Use Account Manager dialog in the WCG client to add Charity Engine as my account manager; I did not install any new software.

In case it's useful, here's one of the log entries - there were two in fairly quick succession, each for a separate instance of the same executable.

Faulting application name: rosetta_4.07_windows_intelx86.exe, version: 0.0.0.0, time stamp: 0x5a949309
Faulting module name: rosetta_4.07_windows_intelx86.exe, version: 0.0.0.0, time stamp: 0x5a949309
Exception code: 0xc0000005
Fault offset: 0x014e7142
Faulting process id: 0x3630
Faulting application start time: 0x01d5f5da1135a602
Faulting application path: C:\ProgramData\BOINC\projects\boinc.bakerlab.org_rosetta\rosetta_4.07_windows_intelx86.exe
Faulting module path: C:\ProgramData\BOINC\projects\boinc.bakerlab.org_rosetta\rosetta_4.07_windows_intelx86.exe
Report Id: fd329105-1210-409b-81fb-dc9ad3eac162
Faulting package full name:
Faulting package-relative application ID:

Cheers,
  Jon.

JE
Jon Etkins ID: 6571906 Posts: 6
11 Mar 2020 09:03 PM

Follow-up - it's not just you guys, there seems to be something happening with Rosetta in general lately. I have a work laptop that is also running the WCG client (and only WCG projects), and it just encountered a 0xc0000005 error on a Rosetta work unit that it was given as part of the Microbiome Immunity Project.

Tristan Olive ID: 22 Posts: 383
13 Mar 2020 03:13 PM

Thanks for sharing this info, Jon. We're working with Rosetta to see about getting this resolved, so will pass along the error messages.

Matt ID: 44 Posts: 293
15 Mar 2020 01:35 AM

To confirm, Jon: you have two hosts running Rosetta, and both had this issue?

(And can you confirm how much RAM you have on the host(s) which have issues with Rosetta)

Matt ID: 44 Posts: 293
15 Mar 2020 08:05 PM

Also, Jon, was this someting that impacted your use of your device; or was this a thing you observed in logs, i.e. something that impacted CE/BOINC computations?  If the former, can you describe any symptoms?

(For instance: Windows calculates a stability metric, which goes down if apps crash.  This metric includes crashing of backgound computations of the sort we run, even though these don't imact user-experience or overall system performance in most cases.  So I wonder if you report is based on a metric like that, or if it was a change in observed system behavior?)

Matt ID: 44 Posts: 293
16 Mar 2020 09:27 PM

And one more question, Jon: are both your devices getting the application "rosetta_4.07_windows_intelx86.exe", per your log excerpt above? (And not, for instance, "rosetta_4.07_windows_intelx86_64.exe?)

JE
Jon Etkins ID: 6571906 Posts: 6
16 Mar 2020 11:48 PM

Hi, Matt.  Sorry for the delay - been away for a a few days.  To address your questions in order...

1) I have two hosts.  My personal PC has 32GB RAM, while my work laptop has 16GB. Both are running the WCG BOINC client, but only WCG is blessed by my employer, so my work laptop is running WCG work only and is not connected to CE.

2) I had been noticing general weirdness and instability for the previous week or so, particularly when running other memory-hungry apps like Lightroom.  Symptoms included the Windows GUI interface restarting several times, Lightroom locking up, and IIRC even a complete reboot on one occasion.  I originally thought I might have had a video or driver issue, but I have not touched anything in that area.  It wasn't until I noticed a pop-up error message about the c000005 error that I put two and two together.  After I stopped running Rosetta work units, the system has been rock solid once more.

3) As mentioned, the laptop is running WCG projects only.  However, it appears that the Microbiome Immunity Project - one of the WCG projects - uses the Rosetta engine.  The process that failed on that system was wcgrid_mip1_rosetta_intel86.exe. Both systems are 64-bit Win10, but they both appear to be running x86 work units.

After the WCG unit crashed, I also disabled the MIP project in my WCG account and raised the issue in the WCG forum; it would appear from the ensuinig discussion that Rosetta is no stranger to odd behavior.  That thread can be found here.

 

JE
Jon Etkins ID: 6571906 Posts: 6
17 Mar 2020 12:34 AM

Oh, I should point out that I have a third system (actually a VM) running the WCG client and connected to CE.  I have not seen any crashes on it, but WCG is pretty much the only thing it processes all day.  That's the second system I was refering to in my original post - the work laptop is a third system that I omitted from the original post.

Tristan Olive ID: 22 Posts: 383
30 Mar 2020 04:29 PM

Rosetta made an adjustment to memory settings that may resolve this problem. Please respond right away if you see it happening again and we'll continue to work on it. Thanks again --

JE
Jon Etkins ID: 6571906 Posts: 6
31 Mar 2020 05:46 AM

Thanks, Tristan.  I don't see any Rosetta WU's yet, but I'll keep an eye out.