-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reboot/Post hang #64
Comments
Did you try this procedure? Specifically the part about MSI. |
@dacmot I do not see the relevance here? Perhaps I wasn't precise enough before. The issue is that the apu2c4 does hang during POST when it is warm rebooted from a running OS. Does not matter if that OS is tinycore, debian or opnsense. It also does not matter if the boot media is a usbstick or a sata drive. When the system hangs only a hard reset helps to get the unit in a working condition. sp5100_tco watchdog does also not help - even when J2 5-6 is shorted. |
4.0.7, 4.0.12 SD card not recognized after reboot... |
Yes, I am using/rebooting from just SD card, Grub (slackware) hangs without message PCEngine's apu2-tinycore6.4.img.gz:
with 4.0.12 bios:
syslinux (tinycore):
|
Successfully compiled v4.6.8, after flashing, the issue reported by @fhloston is still present. A warm reboot causes the apu2c4 to hang just after displaying the BIOS version and before displaying the available RAM. Edit: Seems AHCI is related to this issue.. I'm running OPNsense 18.1.6 on a Kingston mSATA.
Suddenly, a warm reboot worked instantly. Edit 2: Interestingly warm reboot continues to work even if I re-enable those two entries.. |
@TheEvilCoder42 these two entries in loader.conf You are referring to were necessary to prevent running or installing pfSense hang due to failed write command on HDD. They are affecting SATA controller which has little to do with reboots and power cycles. Look at the second @fhloston comment. No matter which OS/quirks/tweaks do You use, the problem will exist. We are aware of the issue and are working on it. |
@fhloston I automated testing this issue and was able to run over 35 cycles of reboot without issue. I runtest on:
Overnight want to run 150 iterations if I will not hit that bug we have to narrow down difference between our configurations. If you can check 4.6.9 on your side it would be great. |
@pietrushnic cool, will test! Can you create the v4.6.9 tag in the release manifest repo? |
@fhloston we are not using manifests anymore for newest releases. If You want to compile firmware by yourself, please use the new pce-fw-builder. Refer to README for usage and build binary using pce-fw-builder v1.0.0 tag. Binaries are also published on PC Engines github page: https://pcengines.github.io/ |
@miczyg1 thanks for the headsup, need to create new jenkins job then. |
@pietrushnic it hangs after 5 reboots here :/ Booting tinycore from lower front usb... `root@box:/media/TINYCORE# 5 /media/TINYCORE/reboot.log Syncing all filesystems. The system is going down NOW! |
@fhloston last time we survived 48 reboots and I hit some issue with test framework (exactly Robot Framewrk and terminal emulator called pyte). So it was not failure of system. I think we have to try to recreate your configuration. What storage are you using to boot? Also can you link us to TinyCore you use? |
@pietrushnic it's TinyCore 6.4 downloaded from the pcengines website especially for apu2 some time ago to reproduce this bug. I have only added the reboot script. |
If you managed to get 50 reboots without hang via iPXE boot i would ollow your assumption, that iPXE does not trigger the problem. My testcase is an external USB stick with tinycore 6.4. Initially i reproduced also via FBSD/Debian on SATA. I have never attempted to reproduce with iPXE boot. I have started to try with tinycore and SD-Card but stopped due to the different device naming and the then not working autostart.sh. TL;DR: USB/SATA and Debian/FreeBSD/Tinycore to reproduce |
@pietrushnic unfortunately booting a Debian over iPXE and doing reboot hanged platform after 5 repeats. So I would assume it is not storage problem, but overall platform issue. |
If you boot via iPXE do you disconnect local storage first - if there is any in the first place? |
@fhloston I do not disconnect the storage. Actually during the test I had two USB stick plugged to teh front connector. I will try a run without any media. |
@miczyg1 yeah, maybe that is the case that we have some media attached I think I was on setup without any storage connected. |
@pietrushnic @fhloston platform survived 50 reboots from Debian booted via iPXE without any media connected. Storage media might be a good shot in this issue. |
Hi, I have the same issue from mSata with v4.8.0.2 on reboots from pfSense. hint.ahci.0.msi="0" but it still hangs after the bios version message. Cold boot are ok, repeated reboot within few minutes are ok, but rebooting after a while the system was on still hangs. |
@miczyg1 is this the same like this @miczyg1 anyone can validate if this was the case on legacy? |
FYI, I also have this problem on the newer firmwares. Currently on 4.8.0.2. |
@Veldkornet thanks. I really don't like that legacy firmware didn't have that issue. @miczyg1 @krystian-hebel we should take a look at it as soon as we get free cycles. |
@Firefishy as stated in the issue: pcengines/pcengines.github.io#22 |
@krystian-hebel thanks for your reply, but this seems to be a bit to risky - (besides from the fact, that I don't know how to do it). I guess, if nobody else comes up with an idea, I will plan for manual reset. |
@b-bittner every firmware update should be finished with full power cycle. Reboot does not cut off the power from memory and processor, thus some leftovers from previous boot may be in place. It is strongly advised to turn off platform after flashing and replug the DC power supply. |
@miczyg1 is this highlighted in documentation? I think we need power cycle always after firmware update. I understand that users expect things will work after reboot, but this is not under our control. Doing warm boot or just reboot may cause unexpected behavior (considering all weird states we can enter with AGESA and old content of memory). This should be clear for all users doing update that we need to do cold power boot after firmware update (most normal PCs and laptops will force cold boot path after firmware update) - I'm not sure if we can force cold boot path programatically after flashing? @krystian-hebel ? |
@ALL: I dont think the firmware update instructions are clear enough to properly stress the need for full power cycle after successful firmware update. |
Dear community, I have another request. As @b-bittner noticed it is sometimes necessary to update firmware remotely. We might have found a way to do this, but we can't test it ourselves due to lack of reproducibility of this issue. If someone affected by this issue would be so kind to test:
Platform should print For curious:Images were build from branch coldboot with SVI2 fix in the same place it was in legacy and 4.9* releases. In this branch the fix was moved before call to AmdInitReset to be applied before the point of previous hangs, but for research I reverted that change. Included change tests if bit ColdRstDet in D18F0x6C (this is one of registers that hold their state even after reset) was set and performs full reset (going into S5 for 3-5 seconds) if it was. It also clears some registers that are checked by AGESA for cold/warm boot testing. This bit can be set with Note that some registers are not reset to their cold-boot values nonetheless, and some can't be changed manually even when they are documented as read-write, not lockable. Because of that it is impossible to force a state that looks 100% like cold boot. State of peripheral devices can make it even more difficult. Because of that we strongly suggest to perform full power cycle after flashing if possible. |
I am not shure if my apu2d4 suffers from the same problem, or if it is something completely different... Symptoms:
Already tried:
--> no effect on the above mentioned symptoms Any ideas what could cause this behaviour? |
@schweizp the symptoms match the problem described in this GitHub issue. Starting with v4.9.0.1 release, the problem should be gone. Try updating the firmware to v4.9.0.1 or newer. The problem was caused by the processor and power management controller (PMIC) communication timeout. Processor requested voltage change due to a frequency change to PMIC, but PMIC did not respond that the operation of voltage change is complete. This caused frequency stuck and reboot hangs. Like mentioned before, fixed in BIOS v4.9.0.1. I would also appreciate if You could try the procedure above that @krystian-hebel described. We would like to make the firmware update more robust and ensure users can do the update remotely from versions < v4.9.0.1 to v4.9.0.1 or newer. If You decide to help us, please attach the boot log from coreboot. |
@miczyg1 |
@miczyg1 Ok, here is what I did: First file is the dmesg.boot from my box with v4.9.0.2 Second file is the dmesg.boot from my box with v4.8.0.7 Third file is the dmesg.boot from my box with the "special" coldboot rom from @krystian-hebel. I hope my description is clear enough, and it can help troubleshooting the issue. |
@schweizp interesting. The binaries that I shared are somewhat modified on the top of v4.9.0.1, so they have SVI2 fix included, and because of that they shouldn't hang on later warm boots (after initial power cycle). Do you say that your platform hangs even though it booted correctly the first time, without re-flashing in the meantime? Also, dmesg isn't very useful, it is the coreboot boot logs that we are interested in, but they can be obtained only through serial. Platform hangs way before OS starts. |
@krystian-hebel @miczyg1 I assume it would be useful to have OS image or binary (if package not available in distro) to perform @miczyg1 I wonder if we can have logging to cbmem and not throw things on serial? |
@pietrushnic we have native support for cbmem log in mainline releases. It is just a matter of compiling and running the util. But if the platform hangs we loose the log because we never reach OS |
@miczyg1 we need a guide for bug reporting for users. Information about |
Yes, the behaviour is correct as you stated it. In the meantime I have decided that I will return the box to the seller, since I had also problems after longer uptime. The box stopped working, and a reboot was not possible. I suspect some kind of thermal problem: the room where the box was running is not heated, and gets rather cold during the winter nights. As stated a reboot in the cold environement was not possible after the "hang". After taking the box back in a heated room, and waiting for the temperautre to acclimate, the box started normally again, so the box is going back!
I agree with the comments above. better documentation for the "non-expert" user for troubleshooting would be useful. |
I think that high humidity resulting from temperature fluctuations can be the issue too. |
We are facing warm reboot hangs also (using PCEngines apu2 boards and bios version <4.9.0.1). After updating to v4.9.0.1 we have not been able to reproduce the issue yet, which is promising. We did an imediate cold start after the bios update. @miczyg1: You write that a cold start (power cycle) is needed after updating (flashing) to v4.9.0.1 in order for the issue to be resolved. But: We need to do the update remotely. Is it ok to do the update and continue running without a cold start and postpone the cold start to later? We understand that in this state on a warm reboot the device can still hang. But if a cold start is done then, the issue should be resolved, right? @krystian-hebel: You also need to do the update remotely. Is there anything speaking against this solution? Of course the important point here is that continue running after updating the bios without an immediate cold start or warm reboot should not lead to any other problems. |
@smuellener flashing new firmware without rebooting is as good as not updating at all. Firmware is read from ROM initially, but not after it starts, and especially it isn't executed. The platform must be booted so that new firmware is loaded and new code executed. The reason why this particular fix doesn't work after warm boot is that the fix is applied after the platform is already FUBAR. Cold reset is required to reset voltage regulator - it is a hardware issue that can't be fixed. The images that I linked a couple of comments earlier could allow for a safe remote reboot, but apart from @schweizp (who was having another issue, probably not connected with this one) nobody tested it yet. If You have some time for testing it we would be much obliged. |
I've flashed v4.8.0.6 , did poweroff and unplugged/plugged from the main, CPU frequency got stuck just after about a minute, so flashed the coldboot firmware and then, since I'm on pfSense 2.5.0-dev (FreeBSD 12), I run the pciconf command and reboot, this is what I got:
System did reboot fine, but as usual the boot order is always changed after flashing... fixed the boot order, reboot gave me the same output as before. CPU frequency doesn't get stuck anymore and system reboots fine. Note that every reboot give me the output as above. Hope it helps. |
@NanoCaiordo are you sure that this is output from the first reboot? There should be Still, it is indeed a good information that it rebooted fine. |
@krystian-hebel you're correct, long story short, the first time I had a couple of mishaps: I upgraded, did make sure the frequency stuck issue happened again and this time I flashed the coldboot rom and the did pciconf again. I could, do a fresh 2.4.4-p2 install, flash v4.8.0.6 back on, power cycle and try again... I just need to be sure if this is the correct path to start fresh again, or not. |
@NanoCaiordo the order of pciconf and flashrom doesn't matter, as long as they are both run on the same boot. In fact, now that we know that it is able to reboot, the only thing left to test is whether |
There you go:
|
Great, that's exactly what I needed to see. Thank You. |
@krystian-hebel anything we still need here? |
@miczyg1 I know it wasnt me who you asked, but at least guys please add this link (if one of you already took the time and typed it) to the pcengines.github.io as "Firmware instructions and reboot" https://github.com/pcengines/apu2-documentation/blob/master/docs/firmware_flashing.md |
@soder10 of course, can do. Note that everything is open and anyone can contribute (including You). Feel free to open pull requests (requires repository fork) if any part of the documentation is not linked or not user-friendly or not well formatted/designed. |
@miczyg1 as there are no further reports of the problem I think it is safe to close this issue. |
apu2c4 with BIOS 4.6.1, 4.6.5, 4.6.6 hangs occasionally during soft reboot/post.
Hard reset clears fault.
4.0.x and 4.5.5 do not show this behaviour.
Reproduce with simple shellscript in tinycore:
date >>reboot.log;sleep 60;reboot
Update: still present in 4.6.8
The text was updated successfully, but these errors were encountered: