- I want to create realistic photos using image generation AI
- Generating a Photo of a “Japanese Woman Eating Ramen” with ImageFX
- The first generated image
- Changed location to Chinese restaurant
- Adding a subtle touch of realism
- Adding a “Yuru-Kawaii” touch
- Adjusting Facial Features
- Changing the Hairstyle
- What Happens When Generating Multiple People?
- Can AI Generate a Photo of Twins?
- Changing the Outfit
- Letting ChatGPT Take Full Control
- Final Thoughts & Image Collection
I want to create realistic photos using image generation AI
If you think of “AI-generated images,” you might imagine anime-style illustrations. However, I personally enjoy creating highly realistic, almost lifelike images.
For generating such realistic photos, ImageFX is a great option. This advanced image-generation app is powered by Imagen 3, a cutting-edge AI model developed by Google.
Features of ImageFX
ImageFX offers the following key features:
- High quality: It can generate highly detailed and realistic images.
- Easy to use: No special knowledge is required, making it intuitive for anyone.
- Free to try: Though a paid plan may be introduced in the future.
Currently, English prompts yield the best results. If you’re not confident in English, you can use a translation app or ChatGPT to create accurate prompts.
How to Use ImageFX
For example, entering “Landscape with blue sky and white clouds” will generate an image that matches the description. You can generate up to four images at a time, and the process is remarkably fast—taking only about 10 seconds.

Previously, users quickly hit the daily limit, but now the service allows up to 30 generations per day. Once you reach the limit, you’ll have to wait until the next day. However, for casual use, the limit is generally sufficient unless you’re generating images frequently.
Additionally, all past generated images are stored in “My Library”, where you can view and download them anytime. The built-in search function is helpful, but finding past images can be challenging if you forget the prompt or have too many images to scroll through. To avoid this, it’s best to download any images you like immediately.
(This information is based on usage within Japan and may vary depending on the country.)
Generating a Photo of a “Japanese Woman Eating Ramen” with ImageFX
Creating landscape photos or surreal, never-before-seen images can be fascinating, but generating realistic images of people is also a lot of fun.
This time, I’ll casually share my experience of generating an ideal image of a “Japanese woman eating ramen.” Ramen is one of Japan’s national dishes, famous for being quick, affordable, and delicious—a combination beloved across the country. Specifically, I’ve chosen shio ramen, a clear, salty broth ramen that’s known for its simplicity and delicate flavor, and happens to be my personal favorite.
There’s nothing too deep here—just a lighthearted walkthrough, so feel free to enjoy it!
(These AI-generated images do not depict real people. They are purely fictional and created from scratch.)
The first generated image

She appears to be trying to eat two bowls of ramen. Moreover, although I was expecting a realistic photo-like image, it turned out as an illustration instead.
This is a common mistake—forgetting to specify in the prompt that I want a photo rather than an illustration.
Next, I’ll try adding the phrase “35mm film” to the prompt.

The result looks quite realistic, almost like a real photo. If it weren’t cold ramen, her left hand might have gotten burned, but other than that, nothing feels particularly off.
If I had to point out one issue, the characters written on the bowl are unreadable. It seems that reproducing Japanese text and kanji is still a challenge.
That said, this image clearly depicts a woman eating cold ramen at a food stall.
However, her hairstyle and facial expression feel a bit off, so let’s make some adjustments.

Once again, it ended up as cold ramen, but this time it’s a delicious-looking salt ramen. The soft-boiled egg is especially well-executed. Also, with the woman smiling and looking directly at the camera, the image feels much more engaging and friendly.
However, upon closer inspection, it seems like her right hand has six fingers. Since fingers are quite complex in shape, AI doesn’t always adhere to the common rule of having exactly five.
To address this, you can try adding phrases like “holding chopsticks with five fingers” or giving the hands a specific role in the prompt. That said, there still seems to be an element of luck involved.
Changed location to Chinese restaurant
I changed the setting this time. Previously, it was an outdoor food stall, but now I’ve placed the scene in a typical Chinese restaurant. I set the lighting to fluorescent to enhance the shadows and create a slightly worn-out atmosphere.

It’s coming together nicely. The text on the menu inside the restaurant might look strange to someone who can read Japanese, but for others, it probably wouldn’t be noticeable.
Looking closely, there are still some unnatural details, like how the fingers on her right hand hold the chopsticks and how the noodles seem overly tangled around them. However, at first glance, it looks like a great photo. Maybe the ramen is incredibly delicious, or perhaps she’s good friends with the staff—her beaming smile really stands out.
This time, she’s properly placing the bowl on the table while eating, and the rising steam clearly shows that the ramen is piping hot.
That being said, details like the folds in her clothing and the way the counter’s shadow falls on the chairs look impressively realistic. By simply cropping out the awkward menu text, the overall image feels quite natural.

Adding a subtle touch of realism
Now that the image is coming together, I feel like adding some finer details.
One common gesture for people with long hair while eating ramen is holding their hair back with one hand. So, I decided to include this action in the prompt to make the scene feel even more natural.
I also adjusted the hair color, opting for a more natural brown tone to enhance the overall realism of the image.

Aside from the fact that the bowl and the spoon have fused together and the noodles seem unusually abundant, the image has turned out quite realistic.
One particularly impressive detail is the pilling on the sweater sleeves, which adds an unexpected touch of realism. But most of all, the hair texture is incredible—each strand is meticulously rendered.
Next, I’ll try adding the gesture of tying up her hair before eating ramen to make the scene even more natural.

It’s really coming together nicely. Since the fingers are naturally hidden, the overall completeness of the image is quite high… or so I thought—until I noticed that the ramen is precariously placed right at the edge of the counter, making it look a bit unstable. On top of that, the woman herself is sitting right on the edge of the chair, which seems a little precarious as well.
Still, the gesture of tying her hair back has a strangely timeless appeal. It’s such a natural movement, yet for some reason, it effortlessly draws the eye.
Adding a “Yuru-Kawaii” touch
Since this gesture naturally hides fingers, which are tricky to generate perfectly, it’s definitely a useful trick! So, keeping the same atmosphere, I decided to change her outfit.
With ChatGPT’s suggestion, I’ll incorporate some “Yuru-Kawaii & Relaxed” elements to create an even more friendly and approachable vibe.
“Yuru-Kawaii” is a Japanese term that combines “yurui” (loose, relaxed, or laid-back) and “kawaii” (cute). It refers to a casual, effortless kind of cuteness—think soft fabrics, oversized sweaters, pastel colors, and a cozy, relaxed vibe.

Once again, both the ramen and the seating position are a bit precarious, but not to the point of being overly distracting.
Next, I’ll try generating an image from a different angle. The outfit and hairstyle will remain mostly the same, but I’m curious to see how the overall atmosphere changes.

At this point, a question suddenly comes to mind.
This person looks like the same individual, but is that really the case? Of course, since she doesn’t actually exist, calling her the “same person” might not be entirely accurate.
However, if we didn’t mention that these images were AI-generated, wouldn’t most people naturally recognize her as the same person?

Even with a different outfit, she still looks like the same person.
The fact that the spoon is placed directly on the counter bothers me a little, but overall, the image looks great.
By the way, according to ChatGPT, if you don’t specify facial features, the AI will automatically generate an attractive face by default.
Here’s what ChatGPT had to say:
If no specific instructions are given, AI-generated faces tend to be androgynous and well-balanced in appearance. In particular, AI-generated images often follow these trends:
Face shape
The default tends to be an oval face, as it is considered the most balanced and universally appealing. If you want a round or long face, you need to specify it clearly.Eye shape
The eyes are usually slightly large and almond-shaped. This might be because AI has learned that this is an ideal eye shape for Japanese women. If you want to emphasize cuteness or a cooler look, it’s better to specify “round eyes” or “narrow, slanted eyes.”Eyebrow shape
AI often generates naturally thick, arched eyebrows, as this gives the impression of a well-proportioned, attractive face. If you prefer a cooler expression, specifying “straight eyebrows” can help.Nose shape
The AI tends to create a slim, well-defined nose, influenced by Western beauty standards. This often results in a higher and narrower nose. If you want a more natural Japanese look, you should specify “a small nose” or “a slightly rounded nose.”Lip shape
AI defaults to moderately full, well-shaped lips. If you want a cuter appearance, specify “small lips”, while for a more sensual look, “full lips” is a better choice.Skin texture and tone
The AI usually generates fair, natural-looking skin, influenced by the Western preference for “clear, porcelain skin,” but with a slight blend of Asian skin tones. If you prefer a healthy tan or olive complexion, you’ll need to specify it explicitly.
AI-generated faces often follow common beauty trends, likely because they are trained on a large dataset of images that reflect existing preferences. However, beauty is subjective, and different cultures have different ideals.
There also seem to be technical limitations at play.
Since the latent space (the compressed representation of features learned by AI) has a limited number of dimensions, it’s difficult to perfectly replicate subtle facial differences—for example, a slight variation in eye angle or shape. In other words, AI struggles to capture fine details in facial features.
Additionally, the denoising algorithm used in the image generation process tends to favor “safe averages”, making extreme facial features less likely to appear. As a result, the AI naturally produces conventionally attractive faces. This aligns with the well-known idea that “average faces are often perceived as more beautiful.”
So, to what extent can AI actually replicate the specific facial features you request?
Adjusting Facial Features
As an experiment, I’ll try specifying detailed facial characteristics. This time, I’ll add the following conditions: round face, monolids, a small and low nose, thin lips, and tanned skin.
Let’s see how well the AI can reflect these traits in the generated image.

Hmm… it doesn’t feel like the specified features were fully reflected, but the character does look somewhat different from the previous one.

And now, for the first time, the ramen bowl has turned black. Am I the only one who thinks a black bowl somehow makes the ramen look even more delicious?
Perhaps the phrase “tanned, sun-kissed skin” influenced not just the character’s complexion but also the color of the bowl.
Changing the Hairstyle
This time, I’ll specify a round face, narrow almond-shaped eyes, and a small, slightly rounded nose. On top of that, I’ll change the hairstyle to a short bob with black hair and see how the AI interprets it.

She clearly looks like a different person from the previous one.
Since I specified a round face, her features appear slightly fuller this time. The height of her nose is also a bit lower than before.
At a glance, the overall facial structure might be similar, but just changing the hair color and hairstyle makes a huge difference in impression. In this image, she seems absolutely starving—gripping her chopsticks with enough force to almost snap them in half. And, as expected, some fingers are still a bit off… her pinky seems to have disappeared midway. It looks like fine details like this are still a challenge.
By the way, I instructed the AI to place her left palm on her left cheek in a “delicious!” gesture, but no matter how many times I tried, it didn’t come out right. Instead, she ended up in a mysterious pose. Well, let’s just say it works in its own way.
What Happens When Generating Multiple People?
Now, what if I try generating multiple people in the same image? Will they all end up with identical faces, or will the AI successfully create distinct individuals? Let’s find out.

All four individuals have distinct facial features.
In reality, it’s extremely rare for multiple people to have identical faces, so it’s only natural that increasing the number of people leads to more variation in their appearances.

This image depicts four sisters, inspired by the Netflix original drama Asure. The show was amazing, but this… doesn’t look all that interesting.
Can AI Generate a Photo of Twins?
Now, can AI create an image of twins standing side by side? Let’s see if it can accurately reproduce two identical faces.

They look quite similar.
If someone told me, “They’re twins,” I would probably believe it.

These two genuinely look like twins.
However, once again, they’re holding the bowls. I keep forgetting to include the instruction “place the bowl on the table.” In Japan, it’s common to eat ramen without lifting the bowl, but the AI doesn’t seem to recognize this cultural nuance.
Changing the Outfit
Generating images like this is really fun. Plus, with ImageFX’s incredibly fast processing speed, there’s little to no frustration, which makes the experience even better.
Now, I’ll have ChatGPT suggest different outfits and continue generating ramen-loving girls in various styles.
First up: Y2K fashion.
Y2K (short for “Year 2000”) refers to the fashion and culture trends from the late 1990s to early 2000s. It has recently made a comeback and is gaining popularity again.

This resulted in quite a unique-looking image. It definitely gives off a social media-friendly, attention-grabbing vibe.
Next, I’ll try generating an image with a “Mild Yankee” theme.
In Japan, “Yankee” refers to delinquent youth, troublemakers, or gang-like subcultures. However, “Mild Yankee” describes a group of young people who, while sharing some stylistic elements with traditional Yankees, are far less aggressive and don’t engage in illegal activities. They tend to have a strong attachment to their local community and a more grounded lifestyle.
According to ChatGPT, Mild Yankee fashion is characterized by “a mix of slight rebelliousness, practicality, and hometown pride.” I’m excited to see how the AI interprets this style!

The character ended up looking like someone straight out of a live-action adaptation of a manga, with a strong and distinct personality. As for the “hometown pride” aspect… well, I’m not entirely sure where that comes in, but maybe she’s wearing a tracksuit from her old high school?
Next, I’ll try generating an image with the theme: “A cyber girl eating cyber ramen at a cyber ramen shop in a cyber city.” Let’s see how futuristic this one turns out!

When the setting is too over-the-top, the image tends to look more like an illustration rather than a realistic photo. This might be due to a lack of real-world reference data.
So, I’ll tone down the setting a bit and give it another try.

The atmosphere of a cyber city is conveyed through the lighting effects and holographic displays.
Most likely, these glasses are also a digital device, automatically analyzing the nutritional content and calories of the ramen while sending signals directly to her brain.
Letting ChatGPT Take Full Control
I let ChatGPT take the lead by simply asking, “Create a prompt for generating the most adorable woman eating ramen.” Let’s see how it turned out!

The quality is impressive. The background is properly blurred, and there are no awkward Japanese characters. The addition of lanterns enhances the atmosphere, and the two slices of chashu are a nice touch. Well done.
Lighting makes a huge difference. Previous images had a “run-down Chinese restaurant” setting, with dim fluorescent lighting, creating a slightly gloomy feel. In contrast, this one is much brighter and more inviting. It seems that lighting and depth-of-field adjustments play a crucial role in enhancing image quality.
Next, I let ChatGPT generate another image without any specific instructions, only telling it “Try a different approach.” Let’s see what it came up with!

This looks just like a cover from Tokyo Calendar magazine. Could this be Ginza? The black-themed interior with ambient lighting creates the perfect atmosphere of a modern, hidden gem for ramen lovers.
When there aren’t many specific requests, it might actually be better to let ChatGPT generate images freely rather than over-directing. After that, making small adjustments to refine details seems like the best approach.

And here we have a mysterious scene—someone slurping ramen while holding an umbrella. She even keeps her bag slung over her shoulder, making her an impressively skilled multitasker.
ChatGPT’s imagination is truly fascinating. Normally, no one would think to have someone eating ramen while holding an umbrella, yet it manages to present the scene so naturally. That unexpected creativity is what makes this so fun to experiment with.

When I pointed out to ChatGPT that “the hairstyles have been the same for a while,” it immediately introduced a wide variety of changes. Not only did the hairstyles change, but small details were also adjusted.
With tweaks to makeup and accessories, the overall impression shifts significantly. Also, the maneki-neko (a traditional cat figurine believed to bring good luck) placed alongside the condiments is a nice touch—it adds a bit of character to the scene.

It seemed like ChatGPT was starting to go a little overboard, as more prompts were failing to generate properly. So, I told it to “keep it simple.”
As a result, it generated a completely ordinary photo of a woman eating ramen—just a typical, everyday scene. And perhaps, this might actually be the most realistic image of the entire session.
Final Thoughts & Image Collection
With that, I think it’s time to wrap things up—after all, even I’m starting to feel a bit of fatigue.
Image generation is surprisingly fun as a hobby. If you haven’t tried it yet, I highly recommend giving it a shot. You might end up with unexpectedly high-quality images, or even hilarious, completely unintended results, making for an unpredictable and entertaining experience. Plus, for now, it’s free to use.
Lastly, here’s a collection of the images featured in this article, along with some that didn’t make the cut.
Comment