this might be hyper specific but does anyone know where i can find literally any html page of natural japanese text that is of decent length and has ALL kanji fully annotated with html <ruby><rb><rt><rp> tags? reason is i want to test out NLP tokenizer accuracy for a project i am building. for those who are familiar, IPADIC is okay but it gets really basic stuff wrong. for example type in "american" in google translate (google uses old IPAdic mecab, as does apple and most big tech companies supposedly). you will get "アメリカ人" and transliteration "Amerikahito" when it should be "AmerikaJIN" really basic stuff like that and it's wrong and noone seems to care…
i know aozora bunko is pretty well annotated but i want some html with even the very basic kanji/compounds that any native speaker would know fully annotated, again so i can test accuracy, thank you!
by giomsan