Python programming newbie question

edx.org has courses from time to time, based on the intro courses at MIT which use python. 6.00.1x, and 6.00.2x

Oh, I think you might be right. It’s just an RSS feed right? Thanks for the tip.

Thanks for the link. 6.002x looks really interesting and a step up from the kind of coursework I did at Coursera.

You’re absolutely right. There is a plug-in for an RSS feed. It’s actually part of the family of plug-ins that Wordpress distributes for importing from various 3rd party platforms.

A frustrating few days on this project. I learned how to make an XML file, load it into memory and manipulate it, then print it out to a file. But … it doesn’t seem to import into Wordpress! I get a success message, but the posts aren’t showing up in the dashboard.

I can export a file (from the Wordpress site I want to migrate my posts to) and I can import that file into a sandbox Wordpress site. But when I try to build a file similar to the one I exported, the posts don’t show up on the sandbox.

Besides cussing, I’m not sure what to try next.

I suppose I could try to adding the data I want to import into the site to the file I exported (as opposed to creating one from scratch). There were some elements I couldn’t figure out how to make. One of the nodes has to have a colon in it, but Python sees that colon as an error, so it’s not really clear to me how to escape the colon so it can be used in the node name. Also, it appears that I may need CDATA blocks, but ElementTree doesn’t support CDATA blocks.

At least, I don’t think CDATA has native support in ElementTree.

Ideas?

I think the next step is figuring out what are the differences between the XML file that imports correctly and the one that doesn’t. Then we follow from there.

Also, if ElementTree doesn’t support CDATA, maybe some other XML library does?

That’s the spirit!

I’ve had to make so many decisions on my own that I’m starting to lose confidence in my choices. I appreciate your interest in my project. Hopefully I can get on a more productive path.

So here is the file I created, the file I exported from the Brevity, and an export from the sandbox site, after a “successful” import.

I just did the import this morning and while the post didn’t import, a category did import. So it’s going in the right direction.

I found a way to do CDATA with ElementTree, but the CDATA blocks look a little different in my file, than the ones in the reference file look. I’m willing to do a different XML library, but it wasn’t clear to me if that would solve my problem. I like ElementTree because it has a tutorial, which I leaned on a lot for this.

Ah, I see a big difference between the file I created and the ones exported from the target/sandbox sites. Mine is all crushed together with no line breaks. I have no idea why that is. Is it encoding?

Maybe it would be helpful to post the code.

import xml.etree.ElementTree as ET

# start CDATA hack


def CDATA(text=None):
    element = ET.Element('![CDATA[')
    element.text = text
    return element

ET._original_serialize_xml = ET._serialize_xml


def _serialize_xml(write, elem, encoding, qnames, namespaces):
    if elem.tag == '![CDATA[':
        write("<%s%s]]>%s" % (elem.tag, elem.text, elem.tail))
        return
    return ET._original_serialize_xml(
        write, elem, encoding, qnames, namespaces)

ET._serialize_xml = ET._serialize['xml'] = _serialize_xml
# end CDATA hack

rss = ET.Element("rss")
rss.set('version', '2.0')
rss.set('xmlns:excerpt', 'http://wordpress.org/export/1.2/excerpt/')
rss.set('xmlns:content', 'http://purl.org/rss/1.0/modules/content/')
rss.set('xmlns:wfw', 'http://wellformedweb.org/CommentAPI/')
rss.set('xmlns:dc', 'http://purl.org/dc/elements/1.1/')
rss.set('xmlns:wp', 'http://wordpress.org/export/1.2/')

channel = ET.SubElement(rss, "channel")
ET.SubElement(channel, 'title').text = 'Brevity: A Journal of Concise Literary Nonfiction'
ET.SubElement(channel, 'link').text = 'http://brevitymag.com'
ET.SubElement(channel, 'description').text = 'Brevity: The journal devoted exclusively to the concise literary nonfiction.'
ET.SubElement(channel, 'pubDate').text = 'Fri, 28 Apr 2017 21:01:20 +0000'
ET.SubElement(channel, 'language').text = 'en-US'
ET.SubElement(channel, 'wp:wxr_version').text = '1.2'
ET.SubElement(channel, 'wp:base_site_url').text = 'http://brevitymag.com'
ET.SubElement(channel, 'wp:base_blog_url').text = 'http://brevitymag.com'
author = ET.SubElement(channel, 'wp:author')
ET.SubElement(author, 'wp:author_id').text = '3'
ET.SubElement(author, 'wp:author_login').text = 'elhajj'
ET.SubElement(author, 'wp:author_email').text = '[email protected]'
ET.SubElement(author, 'wp:author_display_name').text = 'Tim ELhajj'
ET.SubElement(author, 'wp:author_first_name').text = 'Tim'
ET.SubElement(author, 'wp:author_last_name').text = 'Elhajj'
category = ET.SubElement(channel, 'wp:category')
ET.SubElement(category, 'wp:term_id').text = '207'
ET.SubElement(category, 'wp:category-nicename').text = 'issue-29-2009'
ET.SubElement(category, 'wp:category-parent').text = None
ET.SubElement(category, 'wp:cat_name').text = 'Issue 29 / January 2009'

item = ET.SubElement(channel, "item")
ET.SubElement(item, 'title').text = None
ET.SubElement(item, 'link').text = None
ET.SubElement(item, 'pubDate').text = None
ET.SubElement(item, 'dc:creator').text = None
ET.SubElement(item, 'guid', isPermaLink='false' ).text = None
ET.SubElement(item, 'description').text = None
ET.SubElement(item, 'content:encoded').text = None
ET.SubElement(item, 'excerpt:encoded').text = None
ET.SubElement(item, 'wp:post_id').text = None
ET.SubElement(item, 'wp:post_date').text = '2000-01-01 19:20:00'
ET.SubElement(item, 'wp:post_date_gmt').text = None
ET.SubElement(item, 'wp:comment_status').text = 'open'
ET.SubElement(item, 'wp:ping_status').text = 'open'
ET.SubElement(item, 'wp:post_name').text = None
ET.SubElement(item, 'wp:status').text = 'publish'
ET.SubElement(item, 'wp:post_parent').text = '0'
ET.SubElement(item, 'wp:menu_order').text = '0'
ET.SubElement(item, 'wp:type').text = 'post'
ET.SubElement(item, 'wp:post_password').text = None
ET.SubElement(item, 'wp:is_sticky').text = '0'
ET.SubElement(item, 'category', nicename='characterization', domain='post_tag' ).text = 'characterization'

postmeta = ET.SubElement(item, 'wp:postmeta')
ET.SubElement(postmeta, 'wp:meta_key').text = '_edit_last'
ET.SubElement(postmeta, 'wp:meta_value').text = '2'
postmeta = ET.SubElement(item, 'wp:postmeta')
ET.SubElement(postmeta, 'wp:meta_key').text = 'Author'
ET.SubElement(postmeta, 'wp:meta_value').text = 'JIM BEAM'


tree = ET.ElementTree(rss)
root = tree.getroot()

for child in root[0][10]:
    print child.text

title2 = 'This is a new title'
creator = 'elhajj'
story2 = 'This is a new story'

title2 = CDATA(title2)
creator = CDATA(creator)
story2 = CDATA(story2)

root[0][10][0].append(title2)
root[0][10][3].append(creator)
root[0][10][6].append(story2)

for child in root[0][10]:
    print child.text, 'after'


tree.write("filename.xml",
           xml_declaration=True, encoding='utf-8',
           method="xml"
           )

Linebreaks have no meaning in XML itself, so there shouldn’t be any differences in how the XML is parsed. It’s probably something else.

Rhamorim gave you the best advice - find out the differences between the file that will import and the one you made that won’t. Just mark it up in your text editor, adding and or deleting items till you figure it out.

Once you do that then you can move on to the next step - how to fix it and if the libraries you chose are a liability or not. Hint - probably not but still possible.

Oh, that’s a good idea. So just use a text editor until I get the file I created to a state where it will import. And then figure out how to do it with code.

I can do that! Thanks guys.

I can import! I had one of the nodes named incorrectly. Last night I did a line by line analysis.

Thanks guys.

So now I guess now the next thing is to parse the story I want to import and then put it into the import file. After that, I’ll tackle the Excel file stuff. It’s exciting to have a project and even better, some success!

Congratulations! But I hope you know about diff utilities and didn’t do this manually. e.g. Beyond Compare (best, but costs), Meld(decent, free), WinMerge (decent, free), etc. Your text editors and IDEs also have diff tools built in.

I could have used one of those last night! :)

Winmerge hasn’t been updated for years, you should switch over to the forked WinMerge2011

Visual Studio Code has nice diffing (at least when it has a git repo open).

Minor hijack! My daughter’s a junior studying Honor’s College Chemistry at a university and has been advised by some professor to learn programming. It’s not required for her degree, but I think the idea behind the suggestion was that it’d be a very good thing to have on her toolbelt. She’s very strong in math and science, but she’s never been super computer savvy. I personally think it’d be right up her alley, but she’s pretty nervous about the idea. She also has no idea which language(s) she should be thinking about. This being (probably) research/lab related, I’m thinking Python might be a good choice? Ideas, opinions?

Python is an excellent choice for a first language, especially in science/research fields. It’s easy to learn and read, and you get to be productive in it pretty quickly too. So yes, it would be a great choice.

Python is the best general purpose scientific programming language at the moment, due to fantastic library support for most things (in particular: numpy for matrix math, pandas for data analysis, matplotlib for graphing and scipy for solving equations, and many other scientific things), and is also a great way to learn programming. If your daughter is planning to stay in science be it in industry or academia she will have a big advantage in analysing data, making figures and all kinds of other things if she learns programming, so it’s a great thing to do.

I learnt Python after learning C and Matlab so I don’t have any great recommendations for complete beginner guides but this page has quite a few suggestions that are at beginner level:
http://python-guide-pt-br.readthedocs.io/en/latest/intro/learning/

I also recommend getting the Anaconda distribution of Python. It’s basically a distribution of Python that includes all the major scientific libraries preinstalled so if you get the Python 3.6 version of it from here:


You should be good to go without too much faffing around with library installation.

I use Python every day for sciencey things, so if your daughter has any questions feel free to send them over.

I’ll second the be productive part. Last night I used a library for parsing HTML to figure out how to grab the title, the author name, and the author’s story from an HTML page. Yay me!